
EP0335562 



Publication Title: 

Architecture and organization of a high performance metropolitan area 
telecommunications packet network. 



Abstract: 

182d Abstract of EP0335562 

A high capacity metropolitan area network (MAN) is described. Data traffic from 
users is connected to data concentrators at the edge of the network, and is 
transmitted over fiber optic data links to a hub where the data is switched The 
hub includes a plurality of data switching modules, each having a control means, 
and each connected to a distributed control space division switch. 
Advantageously, the data switching modules, whose inputs are connected to the 
concentrators, perform all checking and routing functions, while the 1024x1024 
maximum size space division switch, whose outputs are connected to the 
concentrators, provides a large fan-out distribution network for reaching many 
concentrators from each data switching module. Distributed control of the space 
division switch permits several million connection and disconnection actions to be 
performed each second, while the pipelined and parade! operation within the 
control means permits each of the 256 switching modules to process at least 
50,000 transactions per second. The data switching modules chain groups of 
incoming packets destined for a common outlet of the space division switch so 
that only one connection in that switch is required for transmitting each group of 
chained packets from a data switching module to a concentrator. MAN provides 
security features including a port identification supplied by the data 
concentrators, and a check that each packet is from an authorized source user, 
transmitting on a port associated with that user, to an authorized destination user 
that is in the same : group (virtual network) as the source user; This arrangement 
can also be used to switch voice packets, using a voice interface such as a 
digital switch and a digital voice signal to voice packet converter. In accordance 
with one embodiment of the invention, a packet switch is used for switching voice 
packet outputs of the data switching modules and a circuit switch, such as the 
space division switch, is used for switching data packet outputs. In accordance 
with another embodiment, voice packets are switched from the data switching 
modules through the space division switch to a small group of data switching 
modules, which further switch the voice packets through the circuit switch to a 
destination concentrator. Data supplied from the esp@cenet database - 
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°© Architecture and organization of a high performance metropolitan area telecommunications packet 
network. 

© A high capacity metropolitan area network (MAN} is described, Data traffic from users i$ connected to data 
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concentrator at the edge of .the network, and is transmitted over fiber optic data links to a hub where, the data' is 
switched. The hub includes a plurality of data switching module each having a control means, and each 
connected to a distributed control space division switch. Advantageously, the data switching modules, whose 
inputs are connected to the concentrators, perform all checking and routing functions, while the 1024x1024 
maximum size . space division switch, whose outputs are connected to the concentrators, provides a large fan-out 
distribution network, for reaching many concentrators from each data switching module. Distributed control, of the 
Space division switch permits several million connection and disconnection actions to be performed each 
second, while the pipelined and parallel operation within the control means permits each of the 256 switching 
modules to process at least 50,000 transactions per second. The data switching modules chain groups of 
incoming packets destined for a common otitlet of the space division switch so that only one connection in that 
switch Is required for transmitting each group of chained packets from a data switching module to a 
concentrator. MAN provides security features including a port identification supplied by the data concentrators, 
and a check that each packet. is from an authorized source user, transmitting on. a port associated with that user, 
\o an authorised destination user that is In the same group (virtual network) as the source user; 

This arrangement can also he used' to switch voice packets, using a voice interface such as a digital switch 
and a digital voice signal to voice packet converter, in accordance with one embodiment of the invention, a 
packet switch is used for switching voice packet outputs of the data switching modules and a circuit switch, such 
35 the space division switch, is used for switching data packet outputs. In accordance with another embodiment, 
voice packets, are switched from the data switching modules through the space division swiich to a small group 
of data' switching modules, which further switch the voice packets through tie circuit switch to a destination 
concentrator. 
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ARCHITECTURE AND ORGANIZATION OF A HIGH PERFORMANCE METROPOUTAN AREATELECOMMUNh 

CATIONS PACKET NETWORK 

Technical Field 

This invention relates W packetized data and voice networks. 



in data processing systems Involving a large amount of distributed computing, featuring large numbers 
w of computers and including increasing numbers of personal computers, workstations, and data bases, It is 
frequency necessary to exchange a great deai of data among these data processing systems. These 
exchanges require communications networks. Such networks, referred to as metropolitan area networks, 
when used for interconnecting data processing systems in an area beyond the geographical scope of local 
area networks but loss than the geographical scope of wide area networks, require data networks capable of 
75 transmitting data and telecommunications traffic at a very high bit rate rate with low latency, 

One type of metropolitan area network is a network composed of one or more interconnected data rings 
such as the FDOI (Fiber Distrubuted Data Interface) network. The basic" element of the FDD! network is a 
data ring capabie of transmitting data at BO megabits/second to user nodes connected to each such ring. 
These rings may be interconnected by providing interring nodes which aliow a transfer of data from one 
m ring to another. 

Integrated telephone voice and data switching systems are becoming available for offering customers 
integrated services digital network (ISDN) service. In such systems, data is frequently switched by switching 
data packets using packet switching techniques, Tha use of packet switching techniques for also switching 
voic© signals; converted into packets "has been suggested, for example, in J, S, Turner; U& Patent 
2$ 4,481*945 (Turner). Such arrangements offer the opportunity to take advantage of the high speed of modern 
microelectronic circuitry. 

A problem of such data and voice networks is that if there is no predictable community of interest 
among the user stations or if there is a high community of interest among stations that are geographically 
far apart, much of the data traffic must be transmitted over several rings thus decreasing the data transfer 

30 speed and limiting the total data bandwidth of the metropolitan area network. Further, .such networks 
encounter a high data latency because each node on the ring in a metropolitan area network introduces 
deiay; in a network having rings with many nodes and haying many messages which require transmission 
over several- rings* the delay in transmitting a data message from one station to another can; be 
unacceptabiy long. There is no satisfactory large date network having low latency for the transmission of 

ss data messages between any pair of terminals connected to the network and having the capability of 
transmitting high priority data messages with especially tow latency. Reliability is another problem 
encountered in such networks. Because ait nodes of a Ting must work properly for any message to be sent, 
around the entire ring, it is necessary to provide repair access to each node, The provision of repair access 
can add substantial ttelay at each node thus increasing the latency of data transmitted over the network; in 

40 a typical installation each node is brought to a- wiring closet so that the node may be bypassed at a readily 
accessible point A recognized problem in the prior art, therefore, is that theft is no data network capable of 
serving a metropolitan area, having low data latency between any pair of terminals and having a very high 
total data transfer rate, that is also capable of serving voice terminals, stations and data bases with 
unpredictable and varying communities of interest 



The above problems are solved and an advance is made over the prior art in accordance with the 
principles of this Invention which, featurea a data distribution stage for chaining data packets destined lor a 
common outlet of a circuit switch," and a high-speed, low setup time circuit switching stage for switching the 
output of the data distribution stage, Advantageously, the circuit switching stage can be quite large, using 
present .technology, and can therefore aiiow a very high total data throughput by providing at any instant of 
time a large number of separate paths over each of which data can be transmitted at high-speed data 
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transmission speeds. Advantageously, only tfaia transmission is performed in the circuit switch thus 
permitting a high data throughput rate over each separate path. Advantageously the distributed processing 
performed in ihe : distribution stage allows data messages destined for a particular output Snk of the circuit 
switch to be .recognized and chained, 

5 in one embodiment, the circuit switch Is a space division switch. Advantageously, the data transmission 
rate through each path of such a switch is high* being limited primarily by the characteristics of circuits 
connected .to the two sides of the switch. 

In one embodiment of the invention, user ports are connected to a data concentration switch. 
Advantageously, the use of an initial stage of data concentr ation allows the characteristics of different types 

to of users to be matched to the standard data rate of a data transmission medium such as an optic fiber for 
connecting the data concentration switch to the data distribution switch. Advantageously, the delays to any 
user in using the network are limited to delay associated with the concentration stage, plus delay for 
buffering messages and setting up a connection in the central space division stage, plus propagation delay. 
Delay is limited to buffering and transmission propagation delay if the concentrator, at the time a user is 

is transmitting a message, has bandwidth available, 

A large variety of different kinds of users may be attached to the network, These users include: 
workstations, including, both simple terminals, personal computers* and engineering design workstations; 
computers, including microcomputers, minicomputers; mainframes, and supercomputers, including the 
computers of a distributed computing system; data base servers for accessing large data bases; computer 

20 servers for performing special types of operations such as floating point arithmetic or matrix operations; 
gateway ports for accessing other networks; voice packet assemblers/disassemb&rs for communicating 
telephone signals; and special interconnection facilities for interconnecting two or more metropolitan area 
networks; 

in this embodiment of the invention, the output of each concentration source multiplexer is transmitted 

25 to the distribution stage where messages destined for each destination demuflpiexer connected to user 
input ports, are buffered, in chained blocks of memory. The output:, of th# distribution stage, representing 
messages for ..a given destination demultiplexer, is then switched by the space division switching stage 
directly to that demultiplexer, Advantageously, in this arrangement, data is buffered only in three places: in 
a user system to wait for data transmission resources in the concentrator; in the distribution stage to 

30 assemble data for each destination demultiplexer; and in an interface to a user in order to collect a)! data 
messages destined for that user. 

in one embodiment of the invention, data packets from a plurality of user systems are concentrated on 
to a group of high-speed data i*nk$ connected to the data switching hub. if the first packet that is destined 
for a particular output of the circuit switch is a high priority packet, then the request for a connection to the 

35 destination of that packet becomes a high priority request and is honored before other requests to the 
circuit switch. Advantageously, this arrangement gives a very fast response time to ail packets under normal 
load and gives a fast response to priority packets even under high overbad, 

Packettzed voice signals are switched using a data, switching module that inctude$ a group of b^nks of 
memory for storing consecutive words of a packet, a group of packet input and a group of packet output 

ao handlers and means for distributing data from each of the input handlers to the memory and from the 
memory to each of the output handlers. 

In this specific embodiment, the basic operating speed of each fiber optic link is about 150 
megabits/second, £ach data distribution switch of the distribution stage has four optic fiber inputs and four 
optic fiber outputs. Up to 250 distribution switches can be provided for one metropolitan area network. The 

45 space division switch, therefore, has up to 1,000 input fiber Optic links and 1,000 output fiber optic links. As 
explained above, these ouiput fiber optic links are connected to demultiplexers for accessing the input user 
ports. 

In an alternative embodiment, data packets representing voice signals (voice packets) are switched from 
the data switching modules through a data switch in order to avoid the circuit set up time limitations of a 
so circuit switch. High priority data packets, and, optionally, any single packet messes, can also be switched 
through the data switch. Advantageously, the relatively short voice packets can be separated from the data 
packets representing data, the latter having less rigorous switching delay requirements and, on average, 
being much longer. 

In another alternative embodiment, groups of voice packets are switched from a data and voice: packet 
55 switch through the space division switch to ones of a group of specialist voice packet switching modules 
which collect and further switch the groups of voice packets through fae circuit switch: for connection to the 
destination. Advantageously, in such an arrangement voice packets from a source voice and data packet 
switch destined for a group of destinations., can first be assembled into groups of packets destined for a 
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particular specialist voice packet switch, andvocte packets from many voice and data packet switches can 
then be assembled in each voice packet switch into groups destined for a particular destination. Advanta- 
geously, the number of circuit switch connections required per voice packet switching interval (I.e.* the 
interval between successive voice packets to a particular receiving customer station) is sharply reduced 

s from the number of connections required for switching such voice packets directly from an initial data 
'switching module to an outlet of the circuit switch for transmission to a destination. 

In one embodiment, a local switch is part of the interface between customer voice signals, in analog or 
digital form, and a packet switching system, The digital output signals from the voice switch are placed on 
trunks which are connected to a packet assembler/disassembler (PAD) for packaging and "unpacketteing 

m these signals. Advantageously > such an arrangement permits the complex voice interfaces and control 
software of a local switch to be used white offering the advantage of a centralized daia switching hub for 
distributing the voice traffic widely. Advantageously, In such an arrangement, data signals from customers 
can he readily, connected to the data switching hub. 

For some sources, such as digital private branch exchanges {PBXs). a direct connection is made to the 

15 PAD, 

Id an alternative embodiment, messages for each destination distribution unit, are collected within each 
source distribution unit. These messages are then sent from the source distribution unit to the destination 
distribution unit through the space division switch, Each destination distribution unit then distributes 
received messages to the destination demulipJexer or, for a high-speed destination user, directly to the 
20 destination user. 



Brief De-scrlptjori of the Drawing 

25 FIG, t is a graphic represenation of the characteristics of the type of communications traffic in a 

metropolitan area network. 

FIG, 2 Is a high fevd block digram of an exemplary metropolitan area network (referred to herein as 
MAN) including typical input user stations that communicate via such a network. 

FIG. 3 is a" more detaiied block diagram of the hub of MAN and the units communicating with that 

30 hub. 

FIGS, 4 and 5 are block diagrams of MAN illustrating how data flows from input user systems to the 
hub of MAN and back to output user systems. 

FIG. 6 is a simplified illustrative: example of a type of network which can be used as a circuit switch 
in the hub of MAN. 

35 FIG. 7 is a block diagram of an illustrative embodiment of a MAN circuit switch and its associated 

control network. 

FIGS, 8 and 8 are flowcharts representing the flow of requests from the data distribution stage of the 
hub to the controllers of the circuit switch of the hub. 

RG* 10 is a block diagram of one data distribution switch of a hub. 
40 FIGS. 1M.4 are block diagrams and data layouts of portions of the data distribution switch of the 

hub. 

FIG, 15 is a block diagram of an operation, administration, and maintenance (OA&M) system for 
controlling the data distribution stage of the hub. 

FIG. 16 Is a bfock diagram of an interface module for interfacing between end user systems and the 

45 hub. 

FIG. 17 is a block diagram of an arrangement for interfacing between an end user system and a 
network interface; 

FIG. 18 is a block diagram of a typical end user system. 

FIG, 19 is a block diagram of a control arrangement for Interfacing between an end user system and 
60 the hub Of MAN. 

FIG, 20 is a layout of a data packet, arranged for transmission through MAN illustrating the MAN 
protocol. 

FIG, 21 illustrates an alternate arrangement for controlling access from the data distribution switches 
to the circuit switch control. 

55 FIG.- : 22 is : a block diagram, illustrating arrangements for using MAN to switch voice as well as data. 

FIG* 2,3 illustrates an arrangement for synchronising data received from, the circuit switch by one of 
the data distribution switches. 
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FIGS, 24 and 26 illustrate an alternate arrangement for the hub for switching packettoed voice and 

data. 

Fl(a + 25 is a block diagram of a MAN circuit switch controller. 



General Description 

The Detailed Description of this specification is a description of an exemplary metropolitan area network 
(referred, to herein as MAN) that incorporates the present invention. Such, a network as $hown in FIGS. 2 
w and 3 includes an outer ring of network interface modufes (NIMs) 2 connected by fiber optic links 3 to a hub 
1 The hub interconnects data and voice packets from any of the NiMs to any other NIM. The NiMs, in turn, 
are connected via interface modules to user devices connected to' the network. 

The invention embodied in the Detailed Description relates to the hub of the network, Whife the entire 
Dstatied Description supports the invention as claimed, that portion which deals with FIGS, 3-5, and- 10*15 is 
is especially pertinent to the architecture -of the hub. 



Detailed Description 

1 INTRODUCTION 

Data networks 1 often are classified by their size and scope of ownership. Local area networks {LANs) are 
usually owned by a single organization and have a reach of a few kifom@t$rs, They interconnect tens to 

25 hundreds of terminals, computers, and other end user systems {£113$). At. the other extreme are wide area 
networks {WANs) spanning continents, owned by common carriers, and interconnecting tens of thousands 
or £U$& Between these extremes .other data networks have been identified whose cope ranges from a 
campus to a metropolitan area. The high performance metropolitan area network to be described herein will 
be referred to as MAN. A table of acronyms and abbreviations is found in Appendix A. 

:3p; Metropolitan a/ea networks serve a variety of EUSs ranging from simple reporting devices and low 
intelligence terminals through personal computers to large mainframes and supercomputers. The demands 
that these EUSs pface on a network vary widely. Some may issue messages infrequently white others may 
issue many messages each second. Some messages may be only a tew bytes while others may be files of 
minions of bytes. Some EUSs may require delivery any time within the next few hours while others may 

3$ require delivery withfo microseconds, 

This invention of a metropolitan are network is a computer and telephone communications network that 
has been designed for transmitting broadband low latency data which retains and indeed exceeds the 
performance characteristics of the highest performance tocai area networks, A metropolitan are network has 
stae characteristics similar fo those of a class 5 or end-office telephone central office; consequently, with 

40 respect to stee. a metropolitan area network can be thought of as an end-office for data, Th$ exemplary 
embodiment of the invention, hereinafter called MAN* was designed with this in mind, However, MAN also 
fits well either as an adjunct to or as part of a switch module for an end-office* thus supporting broadband 
Integrated Services Digital Network (ISDN) services, MAN can also be effective as either a tocal area or 
campus area network, ft is able to grow gracefully from a small LAN through campus steed networks to a 

js full MAN. 

The rapid proliferation of workstations and their servers, and the growth of distributed computing are 
major factors that motivated the design of this invention/ MAN was designed to provide networking for tens 
of thousands of diskless workstations and servers .and other computers over tens of kifometers, where each 
user has tens to hundreds of simuitanoous and different associations with other computers on the network, 

so E£ch networked computer can concurrently generate tens to hundreds of messages per second, and 
require I/O rat$s of tens to hundreds of millions of bits/second (Mbps). Ivtessags sizes may range from 
hundreds of bits to millions of bits. With this level of performance* MAN is capable of supporting remote 
procedure calls, interobject communications, remote demand paging, remote swapping, file transfer, and 
computer graphics. The goal is to move most messages (or transactions as they will be referred to 

55 henceforth) from an EUS memory to another EUS memory within less than a millisecond for small 
transactions and within a few milliseconds for farge transactions, FIQ,, 1 classifies transaction types and 
shows desired EUS response times, as. a function of both transaction type and sEse. simple (U*, low 
intelligence) terminals 70, remote procedure calls (RFCs) and interobject communications ((DCs) 72, 
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demand paging 74, memory swapping 76, animated computer graphics 78, computer graphics still pictures 
80 4 file transfers 82, and p&cketized voice 84, Meeting the response lime/transaction speeds of FIG. 1 
represents part of the goaJs of *he MAN network. As a calibration, iines of constant bit rate are shown where 
the bit rate is likely to dominate the response time. MAN has an aggregate bit rate of 150 gigabits per 
s Second and can handle 20 million network transactions per second with the exemplary choice of the 
processor elements shown Iri FIG. 14. Furthermore, ft has been designed to handle traffic overloads 
gracefully, 

MAN Is a network which performs switching and routing as many systems do, but also addresses a 
myriad of other necessary functions such as error handling, user interfacing, and the like, Significant privacy 
70 and security features In MAN are provided by an ■ authentication capability. This capability prevents 
unauthorized network use, snabtes usage-sensitive billing, and provides nbn-forgeeb-te source identification 
for all information. Capability also exists for defining virtual, private networks. 

MAN is a transaction-oriented (i.e., connectionless) network., It does not need \o incur the overhead of 
establishing or maintaining connections although a connection veneer can be added in a straightforward 
ts fashfetfi if desired. 

MAN can ateo be used for switching- pactetized voice. Because of the short delay in traversing the 
network, the priority which may be given to the transmission of single packet entities, and the low variation 
of delay , when the network is not heavily loaded, voice or a mixture of voice and data can be readiiy 
supported by MAM. For clarity,- the term data as used hereinafter includes digital data representing voice 
so signals* as well as digital data representing commands, numerical data, graphics, programs, data files and 
other contents of memory. 

MAN, though not yet completely built, has been- extensively simulated. Many of the capacity estimates 
presented hereinafter are based on these simulations. 

2 ARCHITECTURE AND OPERATION . 



2.1 Architecture 

The- MAN network is a hierarchical star architecture with two or three levels depending upon how 
closely one looks . at the topology, FiG. 2 shows the network as consisting of a switching center called a hub 
1 linked to network interface modules 2 {NIMs) at the edge of the network. 

The hub is a very high performance transaction storB-and-forwartJ system that gracefully grows from a 

3S smali four Hnfc system to something very large that is capable of bandiing over 20 million network 
transactions per second and that , has an aggregate bit rate of 150 gigabits per second. 

Radiating out from the hut> for distances of up to tens of kilometers are optical fibers {or alternative data 
channels) called externa! links (XLs) (connect NIM to MINT), each capable of handling full duplex bit rates 
on the order of 150 megabits per second. An XL terminates in a'NIWL 

40 A N1M, the outer edge of which deiineates the edge of the network, acts: as a concentrator/dumultiplexer 
and also identifies network ports. It concentrates when moving information into the network, and demul- 
tiplexes when moving information out of the network. Its purpose in concentrating/demultiplexing is to 
interface multiple end user systems 26 (SUSs) to the network in such a way as to use She link efficiently 
and cost effectively, Up to 20 EU$s 26 can be supported by each NiM depending upon the £U$s 

45 networking needs, Examples of such EUSs are the increasingly common advanced function workstations 4 
where- the burst rates are already in the' 10 Mbps- range (wiih the expectation that much faster systems wilt 
soon be available) with average rates order? of magnitude lower. If the EUS needs an. average rate that Is 
cioser to its burst rate and the average rates are of the same order of magnitude as that of a NIM, then a 
NIM can either provide muftipie interfaces to a single EUS £G or can provide a single interface with the 

so entire NIM and XI dedicated to that EUS. Examples of SUSs of this type include large mainframes 5 and 
file servers 8 for the above workstations, focal area networks such as ETHERNET® 8 and high performance 
kjca! area networks 7 such as Proteon® 80, an 80 MBit token ring manufactured by Proteon Corp,, or a 
system using a fiber distributed data interface (FDDI), an evolving America! National Standards institute 
(ANSI) standard protocol ring interface. In the latter two cases, the LAN itseif may do the concentration and 

es the NIM then degenerates to a singie port network interface module* Lower performance focal area networks 
such as ETHERNET 8 and- IBM token rings may not need a!! of the capability that en entire NIM provides, 
tn these cases, the LAN, even though it concentrates, may connect to a port B on a multjport NIM, 

Within each EUS there Is a user interface module (UiM) 13, This unit serves as a high bit rate direct 
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memory access port for the EUS and as a buffer for transactions received from toe network. It also ofMoads 
the EUS from MAN interface protocol concerns. Closely associated with the UIM is the MAN EUS-resident 
driver. It works with the UIM to format outgoing transactions, receive incoming transactions implement 
protocols and interface with the EUSs operating system, 

5 A closer inspection {see FIG* 3) of the hub reveals two different functional units - a MAM switch (MANS) 
10 anrj one or more memory interface modules 1 1 (MINTs). Each MINT is connected to up to four NiMs via 
XLs 3 and thus can accommodate ' up to SO EU$$. The choice of four Mi Ms per MINT is based upon a 
number of factors including transaction handling capacity, buffer memory size within the MINT, growability 
cf the.neEwork, failure group stee> and aggregate bit rate. 

to Each MfNT is connected to the MANS by four Internal links 12 (lis) (connect MINT and MAN switch),., 
one of which is shown for each of the MfNTs in FIG, 3, The reason for four links In this case is different 
than it is for the XL$* Here multiple Sinks are necessary because th.$ MINT will normally be sending 
information .through the MANS to multiple destinations concurrently; a single it would present a bottleneck 
The choice of 4 iLs (as weli as many othGr design choices of a similar nature) wasmatfe on the basis of 

*5 extensive analytical and simulation modeling. The ILs run at the same bit rate as the external links but are 
very short since the entire hub is co-located. 

The smallest hub consists of one MINT with the ILs. looped back and no switch, A network based upon, 
this hub includes up to four NIMs and accommodate up to 80 EUSs, The largest, hub that is currently 
envisioned consist of 256 MINTS and a 1024 x 1024 MANS, This hub accommodates 1024 NiMs and up to 

so 20,000 EUS& By adding Ml NTs and growing the MANS, the hub and ultimately the entire network grows 
very gracefully. 

2,1.1 LUWUs. Packets, SUWU$> an d Transactions 

25 

More going further several terms need to be discussed. EUS transactions are transfer of units of EUS 
information that are meaningful to the EUS, Such transactions might be a remote procedure call consisting 
of a few bytes or the transfer of a 10 megabyte database* MAN recognizes two EUS transaction unit sizes 
that are called long user work unit (LUWUs) and short user work units (SUWUs) for the purposes of this 

30 description. White the delimiting size is easily engineerabfa, usually transaction units of a couple of 
thousand bits or less are considered SUWUs while larger transaction units are LUWUs. Packets are given 
priority within the network to reduce response time based upon criteria shown in FIG. 1 where it can be 
seen that the smaller EUS transaction units usually need faster EUS transaction response times. Packets 
are kept intact as a single frame or packet as they move through the network. LUWUs are fragmented into 

35 frames or packets, called packets hereinafter, by the "transmitting UIM, Packets and SUWUs are sometimes 
collectively referred to as network transaction units. 

Transfers through the MAN switch are referred to as switch transactions and the units transferred 
through the MANS, are switch transaction units* They ar8 composed of one or more network transaction 
units destined for the same NIM, 

f^Sffi— ! — 0vg rview 

Prior to discussing the, operation of MAN, It is useful to provide a brief overview of each major functional 
45 unit within the network. The units described are the UIM 13, NJM 2, MINT 11, MANS 10* end user system 
link, (connects NiM and UIM) (EUSL) 14, XL 3, and fL 12 -.respectively. These units are depicted in FK3. 4, 



2.2-1 User interface Moduie « UtM 13 

SO " 

This module is located within the EUS and often plugs onto an EUS backplane such as a VM.E® bus 
(an IEEE standard bus), an. Intel MULTIBUS II®, mainframe I/O channel, it is designed to fit on one printed 
circuit board for most applications. The UIM 13 connects to the NJM Z over a duplex optical fiber link called 
the EUS link 14 (EUSL), driven by optical transmitter 97 and 85. This fink runs at the same speed as the 
55 external link (XL) 3, The UIM has a memory queue 15 used to store. information on its way to the network. 
Packets and SUWUs are storetiand forwarded to the NfM using out-of-band toy control. 

By way of contrast, a receive buffer memory 90 must exist to receive information from the network, in 
this case entire EUS transactions may sometimes be stored until they can be transferred into End User 
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System memory, The receive buffer must be capable of dynamic buffer chaining. Partial EUS transactions 
may arrive concurrently in an Interleaved fashion. 

Optical Receiver 87 receives sjgnals from optical Hnk 14 for. storage in receive buffer memory 90, 
Control 15 controls U!M 13, and controls exchange of data between transmit lirsWn^firsHHit (FIFO) queue 
s 15 or receive buffer memory SO and a : bus interface for interfacing with bus 92 which connects to &nd user 
system £6, The detaii$ of the control of UIM 13 are shown in FI.Q. tSL 



2.2.2 Network Interface Moduls - MM 2 

JO 

A NIM 2 Is the part of MAN that is at the edge of the network. A NIM performs six functions: (1) 
concentration/demultiplex^ including queuing of packets and SUWUs moving toward the MINT and 
external link arbitration, (2) participation In network security using port identification, {Z) participation \n 
congestion control, (4) EUS-to-natwork control message identification, (5) participation in error handling, and 

rs {£) network interfacing, Small queues 94 in memory similar to those 15- found in the UiM exist for each End 
User System, They receive information from the UIM via link 1 4 and receiver 68 and store it until XL 3 is. 
available for transmission to the MINT, The outputs of these queues drive a data concentrator 95 which in 
turn drives an optical transmitter 9$. An external fink demand multiplexer exists which services demands for 
the use of the XL. The NM prefixes a port identification number 3QQ {FIG* 20) to each network transaction, 

zo unit flowing toward the MINT, This is used In various ways to provide value added services' such as reliable 
and non-fraudulent sender identification and billing. This prefix is particularly desirable for ensuring that 
members of a virtual network are protected from unauthorized access by outsider^ A check sequence is 
processed for error control. The NIM t working with; the hub 1, determines congestion status within the 
network and controls flow from the UlMs under high congestion conditions. The NIM also provides a 

as standard physical and logical interface to the network including flow control mechanisms. 

Information flowing from the network to the £US t$ passed through the NIM via receiver 69, distributed 
to the correct UIM by data distributor 8& and sent to destination UIM 13 by transmitter 85: via link 14. No 
buffering is done at the MM, 

There are only two types of NfMte, One type {such as shown in FIG*, 4 and the upper right of FIG. 3) 

30 concentrates while the other type {shown at the lower right of R0„ 3) does not. 



2,2.3 Memory and Interface Module - MINT 11! 

35 MiNTs are located in the hub. Each MINT 11 consists of; (a).upto four external link handlers 1$ (XLHs) 
that terminate XLs and also receive signals from the half of the internal link that moves data from the switch 
10 to the MINT; (b) four internal link handlers 17 (ILHs) that generate data for the half of the it that moves 
data from a MINT to the switch; (c) a .memory 1 8 for storing data while awaiting a path from the MINT 
through the switch to the destination NIM; (d) a Data Transport Ring 1 9 that moves data between the link 

40 handlers and the memory and also cardes MINT control information; and (e) a control unit 20> 

All functional units within the MINT are designed to accommodate the peak aggregate bit .rate for data 
moving concurrently into and out of the MINT. Thus the ring, which is synchronous, has a set of reserved 
slots for moving information from each XLH to memory and another set of reserved slots for moving 
information from memory io each ILH, It has a read pius write bit rate -oi over 1.5 Gbp$. The memory is 512 

45 bits wide so that an adequate memory bit rate can be achieved with components having reasonable access 
times, The size of the memory (1 6 Mbytes) can be kept small because the occupancy time of information in 
the memory is also smaij (about 0,57 milliseconds under full network load). However, this is an engineerabie 
number that can be adjusted if necessary, 

The XLHs are bi-directional but not symmetric, information moving from NIM to MINT is stored in MINT 

60 memory, Header information is copied by the XLH and sent to the MINT control fpr processing, in contrast, 
information moving from the switch 10 toward a NIM is not stored; in the MINT but simply passes through 
the MINT, without be tog processed, on Its way from MANS 10 output to a destination NIM a* Due to 
variable path lengths in the switch, the Information leaving the MANS 10 is out of phase with respect to the 
XL. A phase alignment and scramb&r circuit (described in section M) must align the data before 

55 transmission to the NIM can occur. Section 4,6 describes the internal Jink handier (1LH). 

The MINT performs a variety of functions including (1) some of the overail- routing within the network, 
0 participation in user validation, (3) participation in network security, (4} queue management, ($) buffering 
of network transactions, (B) address translation* (7) participation in congestion control, and (3) the generation 
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of operation, administration, and maintenance (OA&M) primitives. 

The control for the MINT is a data flow processing system tailored to the MINT control algorithms, £ach 
MINT i$. capabfe of processing up to 8O r 00O network transactions per second, A fully provisioned hub with 
250 Ml NTs can therefore process ZO million network transactions per second. This is discussed further in 
s section 2*3. 



Z2 A MAN Switch - MANS 10 

to The MANS consists of two main parts (a) the fabric 21 through which information passes and (b) foe 
control 22 for that fabric. The control allows the switch to be set up in about 50 microseconds. Special 
properties of the fabric aiiow the control to be decomposed into completely independent sub-controllers that 
can operate in parallel Additionally, each sub-controller can be pipelined. Thus; not: only is the setup time 
very fast but many paths pan be set up concurrently and the rt setup throughput" can be made high enough 

f5 to accommodate high request rates from large numbers of MINTs. MANs can be made in various sizes 
ranging from 18x16 (handing four MINTs) to 1p24 X 1024 (handling 256 MINTs). 



30 

The end user system link 14 connects the NIM 2 to the UIM 13 that resides within; the end user's 
equipment, it is a full duplex optical fiber ifnk that runs, at the same rate and in synchronism with the 
externa! link on the other side of the NIM. It is dedicated to the £0S to which it is connected, The fcnfith of 
the EUSL ts intended fo be on the order of meters to 10s of meters. However, there is no reason why it 
55 couldn't be longer if economics aliow It. 

The basic format and data rate for the EUSL for the present embodiment of the invention was chosen to 
be the same as that of the Metrobus Lightwave System OS-1 fink. Whatever link layer data transmission 
Standard is eventually adopted would be used in later embodiments of MAN. 

223 Externa; Links - XL 3 

The external link (XL) 3 connects the NIM to the MINT, it is also £ ffciil duplex synchronous optical fiber 
link, it is used in a demand multiplexed fashion by the end user systems connected to its NIM- The length 
3$ of the XL is intended to be on the order of 10s of kilometers, Demand multiplexing is used for economic 
reasons. It employs the Metrobus OS*1 format and data rate. 



2.2.? Internal Links - tL 24 

jo 

The internal link: 24 provides connectivity between a* MINT and the MAN switch. It is a unidirectional 
semi-synchronous link that retains frequency but loses the synchronous phase relationship as it passes 
through Ehe MANS 10, The length of the IL 24 is on the order of meters but could be much longer if 
economics allowed. The bit rate of the !L is the same as that of 0S-1, The' format however, has only limited 
js similarity to OS-t because of the need to resynchroni?e the data. 



2,3 Software Overview 

so Using a workstation/server paradigm, each end user system connected- to MAN is able to generate over 
B0 EUS transactions per second consisting of LUYVUs and 8UWUs, This translates into about 400 network 
transactions per second (packets and SUWUs), With up to 20 EUS per NIM, each NiM must be capable of 
handling up to 8000 network transactions per second with each MINT handling up to four times this amount 
or 32000 network transactions per second, These are average or sustained rates. Burst conditions may 

55 substantially increase "instantaneous * rates for a single SJ$ 26. Averaging over a number of EUSs will, 
however, smooth out individual EUS bursts. Thus while each NIM port must deaf with bursts of considerably 
more than 50 network transactions per second, NIMs ,{2) and XLs (3) are likely to see only moderate bursts. 
This is even more true of MINTs 11, each of which serves 4 NIMs. the MAN switch 10 must pass an 
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average of S million network transactions pe* 1 second, but the switch control&r does not need to process 
this many switch request* since the design of the MlMT control allows multiple packets and SUWUs going 
to the same, destination NIM to be switched with a stogie switch setup, 

A second factor to- be considered is network transaction interarrivai time. With rates of 150Mbps and the 
s smallest network transaction being an SUWU of 1000 bits, two SUWUs could arrive at a NIM or MINT 6.67 
microseconds apart. NIMs and MINTS must be able to handle several bacMo-bacfe SUWUs on a transient 
basis. 

The control software in the NIMs and especially the MINTs must deal with this severe real-time 
transaction processing. The asymmetry and bursty nature of data traffic requires a design capable of 

10 processing peak loads for short period of time. Thus the transaction control software structure must foe 
capable of executing many hundreds of millions of CPU Instructions per second (100's of MtPs). Moreover, 
In MAN, this control software performs a multiplicity of functions including routing of packets and SUWUs, 
network' port identrfication, queuing of network transactions destined far the same NIM over up to 1000 
NIMs (this means real time' maintenance of up to 1000 queues), handling of MANS requests and 

ffi ac |< n g W tedgernents, flow control of source EU$$ based on complex criteria, network traffic data collection, 
congestion control and a myriad of other tasks. 

The MAN control software is capable of performing all of the above tasks in reai time. The control 
software is executed in three major components: NIM control 23, MINT control 20, and MANS control 22, 
Associated with these three control components \$ a fourth control structure .25- within the U!M 13 of the End 

ao User System 26, FIG. 5 shows this arrangement Each NIM and MINT has its own control unit. The control 
units function, independently but cooperate closely. This partitioning of control is one of the architectural 
mechanisms' that makes possible MAN'S readme transaction processing capability. The other mechanism 
that allows MAN io handle high transection rates is the technique of decomposing the control into a logical 
array of subfunctos and independently applying processing power to each subtunctlon. This approach has 

?s been greatly facilitated by the use of Transputer® very large scaie integration (VLSI) processor dsvioss 
made by INMOS Corp, The technique basically is as follows: 

- Decompose the problem into a number of subfunctions, 

- Arrange the subfunctions to form a dataflow structure. 

^ implement each subtraction as one or more processes, 
oo - Bind sets of processes to processors, arranging the bound processors trr the same topology as the 

dataflow structure so as to form a dataflow system that will execute the function* 

* Iterate as necessary to achieve the reaRime performance required. 

Brief descriptions of the functions performed by the NIM, MINT, and MANS (most of which are done by 

the software control for those modules) are given in sections 2.2.2 through 22A, Additional information is 
$$ given in section 2.4. Deiaited descriptions are included iatet in this description wfthin specific sections 

covering these subsystems. 



2,3,2 Control Processors 

The processors chosen for the system implementation are Transputers from INMOS Corp. These 10 
million instructions/second (MIP) reduced instruction set control (RISC) machines are designed to be 
connected in an arbitrary topology over 20 Mbps serial links. Each machine has four links with, an input and 
output path capable of simultaneous direct memory access (DMA). 



2.3.2 MINT Control Performance 

Because of the need to process a large number of transactions per second, the processing of each 
so transaction is broken into serial sections which form a pipeline. Transactions are fed into this pipeline .where 
they are processed simultaneously with other transactions at more advanced stages within the pipe. In 
addition, there are multiple parallel pipelines each handling unique processing streams simultaneously. 
Thus* the required high transaction processing rate, where each transaction requires routing and other 
complex servicing, is achieved by breaking the control structure into such a parallel/pipelined fabric of 
$6 interconnected processors, 

A constraint on MINT control is that any seriai processing can take no tonger than 
J / (number of transactions per second processed in this pipeline}, 

A further constraint concerns the burst bandwidth for headers entering the controi within an XLtf 15. IF the 
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time between successive network units arriving at the XLH is lass than 
(header sue) / (bandwidth into control) 

ihen the XLH must buffer headers. The maximum number of transactions per second assuming uniform 
arrival is given by: 
5 {bandwidth into control) / ( size of transaction header). 

An example based' upon the effective bit rate of transputer links and the 40 byte MAN network transaction 
header is; 

(&OMb/$ for control link}/(320 bit header/transaction) -*25,000 transactions/sec, per XLH, 

or am transaction per XLH ©very 40 microseconds. Because transaction interarrtva! times can be less than 

70 this, header buffering Is performed in the XLH, 

The MINT must be capable, within this time, of routing, executing billing primitives, making switch 
requests, performing network control, memory management, operation, administration, and maintenance 
activities, name serving, and aiso providing other network services such as yellow page primitives. The 
paraliel/pipeiined nature of MINT control 20 achieves these goals.. 

;s As an example, the allocating and freeing of highspeed memory blocks can pe processed completely 
independently -of routing or billing primitives. Transaction flow within a MINT is controlled in a single pipe by 
the management of the memory block address used for storing a network transaction unit (ie. packet or 
SUWU), At the, first stage of the pipe, memory management allocates free Blocks of high-speed MINT 
memory, -^en, at $ ne next S teg^ these blocks are paired with the headers and routing translation is done, 

so Then switch units are collected based on memory blocks sent, to common NIMs, and to close the loop the 
memory blocks are freed: after the blocks' data is transmitted into the MANS. Billing primitives are 
simultaneously handled within a different pipe. 



as 2.4- MAN Operation 

The EUS 26 is viewed by the network as a user wfth capabilities granted by a network administration. 
This is analogous to a terminal user logged into a 3me-sharing system. The user, such as a workstation or a 
front end processor acting as a concentrator for stations or evert networks, will bo required to make a 

30 physical connection at a NIM port and then Identify itself via its MAN nam$, virtual network identification, 
and password security. The network adjusts routing tebies to map data destined for this name to a unique 
HIM. port. The capabilities of this user are associated with the physical port. The example just given 
accommodates the paradigm of a portable workstation. Ports may also be configured to have fixed 
capabilities and possibly be "owned" by one MAN named end usor, This gives users dedicated network 

$$ ports or provides privileged administrative maintenance ports* The source EUS refer to the destination by 
MAN names or services, so they are not required to know anything about the dynamic network topoiogy. 

The high bit rate and large transaction processing capability internal to the network yield very short 
response times and provide the EUS with a means to move data In a metropolitan area without undue 
network considerations, A MAN end. user will see-gUS-memory-to-EUS memory response times as low as a 

*o millisecond, low error rates, and the ability to send a hundred EUS transactions per second on a sustained 
basis. This number can expand to several thousand for high performance EUSs. The EUS will send data tn 
whatever size is appropriate to his needs with no maximum upper bound. Most of the limitations on 
optimising MAN performance are imposed by the limits of the EUS and applications, not the overhead of 
Eha network. The user will supply the following information on transmitting data to the U1M: 

A$ * a MAN name and virtual network name for the destination address that is independent of the physical 
address. 

- The size of the data. 

« A MAN type field denoting network service required. 

- The data. 

so Network transactions {packets and SUWUs) move: along the following: logical path (see FIG, 5): 

sourceUlM ~ sourceNIM - = -* MINT * ~^ MANS==~* destination NlM{via MINT) de- 
stinationUiML 

Each EUS transaction (U„ LUWU or SUWU} is submitted to its UiM, inside the UiM, a LUWU is further 
fragmented into variable sizo packets. An SUWU is not fragmented but is logically viewed in its entirety as 
55 a network transaction. However, the determination that a network transaction is an SUWU is not made until 
the SUWU reaches the MINT where the information is used in dynamically categorizing cfela into SUWUs 
and packets for optimal network handling. The N1M checks incoming packets from the EUS to verify that 
they do not: violate a maximum packet stae. The Uffrf may pick packet sizes smaller than tte maximum 
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depending on EUS stated service. For optimum MINT memory utilisation, the packet size is the standard 
maximum. However under some circumstances, the application may request that a smaller packet size be 
used because of end user consideration such as timing problems or data availability timing. Additionally, 
mere may be timing limits where the UIM will send, what it currency has from the EUS, Even where the 
maximum size packet is used, the last packet of a LUWU usually Is smaller than the maximum size packet. 

At the transmitting UIM -each network transaction (packet or SUWU) is prefixed with a fixed length MAN 
network header, it is the Information, within this header which the MAN network software uses to route, bill 
offer network services, and provide network control The destination Miaiso uses the information within 
this header in its job of delivering EUS transactions to the end user, The network transactions are stored in 
the UIM source transaction queue from which they are transmitted to the source NiM, 

Upon receiving network transactions from UIMs, the NIM receives them in queues permanently 
dedicated to the EUSLs on which the transaction arrived, for forwarding to the MINT 11 as soon as the link 
3 becomes available The control software within the H\U processes the UIM to NIM protocol to identify 
control messages and prepends a source port number to the transaction that wilt be used by the MINT to 
authenticate the transaction. Bid-user data will never be touched by MAN network software unless the data 
is addressed to the network as control information provided by the end user- As the transactions are 
processed, the source NIM concentrates them onto the external link between the source NIM and its MINT, 
The source NIM to- MINT links terminate at a hardware interface In the MINT {the external link handler or 
XLH 18). 

The external link protocol between the NIM and MINT ailows the XLH 16 to detect the beginning and 
end of network transactions. The transactions are immediately moved into a memory 18 designed to handle 
ihe 150Mb/s- bursts of data arriving at the XLH. This memory access is via a high-speed time? slotted ring 19 
which guarantees each TSOMWs XLH input and each 150Mb/s output from the MINT (ie* MANS Inputs) 
bandwidth with no contention. For example, a MINT which concentrates 4 remote NIMs and has 4 input 
55 ports to the center switch must have a burst access bandwidth of at least 1,26b/s, The memory storage is 
used to fixed length blocks of a size equal to the maximum packet size plus the fixed iength MAN header, 
The XLH moves an. address of a fixed size memory block followed by the packet or SUWU data to the 
memory acc8g5 r ] n g. The data and network header are stored until the MINT control 20 causes its 
transmission into the MANS. The MINT control 20 will continually supply the XLHs with free memory block 
addresses for storing the incoming packets and SUWUs. The XLH also "knows" the length of the fixed sizs^ 
network header. With this information the XLH passes a copy of the network header So MINT controf 20. 
MINT control 20 pairs the header with the block address it had given the XLH for storing the packet or 
SUWU Sfoce the header is the onfy internal representation of the data within MINT control it to vital that it 
be correct To ensure sanity due to potential link errors the header has a cyclic redundancy check (CRC) of 
its own. The path this tuple takes within MINT control must be the same for all packets of any given LUWU 
(this allows ordering of LUWU data to be preserved). Packet, and SUWU headers paired with the MINT 
memory block address will move through a pipeline of processors. The pipeline allows multiple CPUs to 
process different network tran$actions at various stages of MINT processing. In addition, there are multiple 
pipelines to provide concurrent processing, 
40 MINT control 20 selects an unused internal link £4 and requests a path setup from the IL ip the 
destination NIM (through the MINT attached to that NIM). MAN switch control 21 queues the request and 
when, the path is available and (2) the XL 3 to &e destination NIM is also available, it notifies the source 
MINT while concurrently setting up the path. This, on average and under full load, takes 50 microseconds. 
Upon notification, the source MINT transmits al\ network transactions destined for that NIM, thus taking 
45 maximum advantage of the path setup. The internal link handler 17 requests network transactions from the 
MINT memory and transmits them over the path: 
JLH «-* sourcelL = MANS «-* destination IL - XLH t 

this XLH being attached to the destination NIM. The XLK recovers bit synchronization on the way to the 
destination NIM. Note that information, as it leaves the switch, simply passes through a MINT on its way to 

so the destination NIM, The MINT doesn't process it in any way other than to recover bit synchronization that 
has been lost In going through the MANS- 

As information (Le„ switch transactions made up of one or more network transactions) arrives at the 
destination NIM it is demultiplexed into network transactions (packets and SUWUs) and forwarded to the 
destination UIMs. This done "on the fly"; there is no buffering in the NIM on the way out of the network, 

$5 The receiving UIM. 13 win store the network transactions in its receive buffer memory &D and recreate 
EUS transactions (LUWUs and SUWUs). A LUWU may arrive at the UIM In packet sized pieces. As soon as 
at least part of a LUWU arrives, the UIM will notify the EUS of its existence and will, upon instructions from 
the EUSt transmit under the controi of its DMA, partial EUS or whole EUS transactions into the EUS 
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memory in DMA transfer sizes specified, by the EUS, Alternate' paradigms exist for transfer from UiJM to 
EUS. For instance, an EUS can tell the UIM ahead of time that whenever anything arrives the Ut.M should 
transfer it to a specified buffer in EUS memory, The UfM would then not need to announce the arrival of 
information but would immediately transfer it to the EUS, 



2,5 AdditionalConslderailons 



10 ^IS l fencing 

In order to achieve latencies in the order of hundreds of microseconds from EUS memory to EUS 
memory, errors must bo handled in a manner thai differs from that used by conventional data networks 
today, in MAN* network transactions have a header chock sequence 626 (FI0. 20) {HQS} appended to the 
header and a data, check sequence 646 (FIG. 20} (DCS) appended to the entire network transaction. 

Consider the header first The source UiM generates a HCS before transmission to the , source NIM At 
the MINT the HCS & checked and, if in error, the , transaction is discarded. The destination MM' performs a 
similar action- for a third time before routing the transaction to the destination UfM. This scheme prevents 
misdelivery of information due to corrupted headers. Once a header is found to be Hawed* nothing in the 
header can. be considered reliable and the only option thai MAN has Is to discard the transaction. 

The source UtM Is also required to provide a DCS at the end of the user data. This field is checked 
within the MAN network but no action is taken if errors are found. The information is delivered to the 
destination .UIM who can check it and take appropriate action. Its use within the network Is to identify both 
EUSL and internal network problems. 

Note that, there is never any attempt within the network to correct erros using -the usual automatic repeat 
request (ARQ) techniques found in most, of today's protocols. The need for low- latency precludes this. Error 
correcting schemes would be too costly except for the headers, and even here the time penalty may be too 
great as has sometimes been the case in computer systems* However, he^t error correction may be 
employed' later If experience proves that it is needed and time-wise possible. 

Consequently, MAN checks; for errors arid discards transactions when there is reason to suspect the 
validity of the headers* Beyond this, transactions, are delivered even if flawed. This is a reasonable approach 
for three reasons- first, intrinsic error rates over optical fibers are of the same order as error rates over 
copper when common ARQ protocols are employed. Both are in the range of 1CT 11 bits per bit. Secondly, 
graphics applications (which are increasing dramatically) often can tolerate small error rat.es where pixel 
images are transmitted; a bit or two per image would usually be fine. Finally, where error rates need to be 
better than the intrinsic rates, EUS-to-EUS AftG protocols can be used (as they are today) to achieve th£$e 
improved error rates. 



40 2.62. Authentication 

MAN provides an authentication feature, This feature assures a destination BUS of the identity of the 
source EUS for each and every transaction it receives, Malicious users cannot send transactions with forged 
"signatures", Users are also prevented from using the network free of charge; all users are forced to 

4$ identify themselves truthfully with each and every transaction that they send into the network, thus providing 
for accurate usage-sensitive biding. This feature also provides the primitive capability for other features such 
as virtual private networks. 

When an EUS first attaches to MAN, it "logs in" to a wel) known and privileged Login Server that is part 
of the network. The login server is in an, administrative terminal 350 (FIG. 15) with an attached disk memory 

so 351. The administrative terminal 350 is accessed via an OA&M MINT processor 315 (FIG, 14) and a MINT 
0A&M monitor 317 in the MINT central control 20, and an OA&M centra! control (RG, 15). This login is 
achieved by the EUS (via its UIM) sending a login transaction to the server through the network. This 
transaction contains the EUS idenificatiofi number {its name), its requested virtual network, and a password, 
to: the NiM a port number is prefixed to .the transaction before it is forwarded io the MINT for routing to the 

55 server, The Login Server notes the id/port pairing and lnforms foe MINT attached to the source NIM of that 
pairing. It sfso acknowledges its receipt of the login to the EUS, tailing- the EUS mat it may now use the 
network. 

When using the network. :: each and every network transaction that it sent to the source NIM from the 
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EUS has, within iis header, its source id plus other information in the header described below with respect 
to R& 20. The NiM prefixes the port number to the transaction and Forwards it to the MINT where the 
pairing, is checked. Incorrect pairing resuits in the MINT discarding the transaction, in the MINT, the 
prefixed source port number is replaced with a destination port number before it is sent to the. destination 
5 NIlvL The destination NIM uses this destination port number to compile the routing to the destination EUS. 
jf an EUS wishes to disconnect from the network, it 'logs off' In a manner similar to its login, The Login 
Server informs the MINT of this and the MINT removes the id/port information, thus rendering that port 
inactive, 

TO 

g jj.3. Guaranteed Ordering 

Prom NIM to KIM the notion of a LUWU does not exist Even though LUWUs lose their identity within 
the NlM to NiM envelope! the packets of a given LUWU must follow a path through predetermined. XLs and 
T5 MINTs. This allows ordering of packets arriving at UJMs to be preserved for a LUWU, However, packets 
may be discarded due' to flawed headers. The UI.M checks for missing packets and notifies the EUS in the 
event that this occurs. 

20 2*5,4, Virtual Circuits and Infinite LUWUs 

The network does not set up a circuit through to the destination but rather switches groups of packets 
and SUWUs as resources become available. This does not prevent the EUS from setting up virtual circuits; 
for example She EUS could write an infinite she LUWU with the appropriate, UIM tiffling parameters. Such a 

25 data stream wouid appear to the EUS as a virtual circuit while to She network it would be a never ending 
LUWU that moves packets at a time. The implementation of this concept must be handled between the UiM 
and the EUS protocols since there may be many different types of EUS and UfMs. The end-user can be 
transmitting multiple data streams to any number of destinations at any one time. These streams are 
multiplexed on packet and SUWUs boundaries on the transmit link between the source UIM and the source 

30 NIM, 

A parameter to be adjusted for optimum performance as the system is loaded, limits £he time 
(equivalent to limiting the length of the data stream) that one MINT pan send data, to a HM in order to free 
that NIM to receive data from other MtNT$> An initial value of 2 milliseconds appears reasonable based on 
simulations; The value can be adjusted dynamically In response to traffic patterns in the system, with 
35 different values possible for different MINTs or NIMs, and at different times of the day or different days of 
the week, 

3 SWITCH 

40 

The MAN swtich (MANS) is the fast circuit switch at the center of the MAN hub. It interconnects the 
MiNTs, and all end-user transactions must pass through it The MANS consists of the switch fabric Itself, 
(called the data network or DNet), plus the switch control complex (300), a collection of controllers and 
links that operate the DNet fabric. The must receive requests from the MINTs to connect or disconnect 

45 pairs of -incoming and outgoing internal links (lis), execute the requests when possibfe, and inform the 
MINTs of the outcome of their requests. 

These apparently straightforward operations must be carried out. at a high performance level. The 
demands- of the MAN switching probiem are discussed In the next section. Next, Section 3.2 presents the 
fundamentals of a distrlbute^control circuit-switched network that is offered as a basis for a solution to such 

so switching demands. Section 3,3, tailors this approach to the specific needs of MAN and covers some 
aspects of the control structure that are critical to high performance. 



3.1 Characterizing the Problem 

Rrst we estimate some numerical values for the demands on the MAN switch. Nominally, the MANS, 
must establish or remove a transaction's connection In fractions of a millisecond In a network with hundreds 
of ports, each running at 150 Mb/s and each carrying thousands of separately switched transactions per 
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second. Millions of transaction requests per second imply a distributed control structure where numerous 

pipelined controllers process transaction request In parallel 

The combination of so many ports each running a high speed has several implications. First, the 

bandwidth of the network must be at least 150 Gb/s, thus requiring multiple data paths (nominally 150 Mb/s) 
s through me network. Second, a 150 Mb/s synchronous network would be difficult to build (although an 

asynchronous network needs to recover clock or phase), Third, since inband signaling creates a more 

complex (self-routing) network fabric and requires buffering within the network, an outerf-band signaling 

(separate control) approach is desirable. 

In MAN, transaction lengths are expected to vary by several orders of magnitude. Tliese transactions 
70 can share a single switch, as discussed hereinafter with adequate deiay performance for small transactions, 

The advantage of a single fabric i$ that data streams do not have to be separated before switching and 

recombined afterwards. 

A problem to be dealt with is the condition where the requested output port is busy. To set up a 
connection, the given input and output ports must: be concurrently idle (the so-called concurrency problem). 
;s If an idle input (output) port waits for the output (input) to become idle, the waiting port is inefficiently 
utilised aid" other transactions needing that port are delayed. If the idle port is instead given to other 
transactions,, the original busy destination port may have become idle and busy again in the meantime, thus 
adding further delay to the original transaction. The delay problem is worse when the port is busy with a' 
large transaction. 

20 Any concurrency resolution strategy requires that each port's busy/idle status be supped to the 
controllers concerned with it, To maintain a high transaction rate, this status update mechanism must 
operate with short delays. 

If transaction times are short and most delays are caused by busy ports, an- absolutely non-blpcking 
network topology Is not required, but the blocking probabiiity should be smalt enough so a$ not to add 

25 much to delays or burden the SCO with excessive unachievable connection requests.. 

Broadcast (one to many) connections are a desirable network capability. However, even if the network 
supports broadcasting* the concurrency problem (here even worse with the many ports involved) must be 
handled without disrupting other traffic. This seems to ruts out the simple strategy of waiting for ait 
destination ports to become idle and broadcasting to all of them at once, 

M Regardless of the special needs of the „MAN network, the MANS satisfies the general requirements for 
any practical network. Startup costs are reasonable. The network is growabte without disrupting existing 
fabric. The topology is inherently efficient in its use of fabric and circuit boards. .Finally, the concerns of 
operational availability - reliability, fault tolerance, failure-group sizes, and ease of diagnosis and repair - are 
met 



In this section we describe the basic approach used in the MANS, tt specifically addresses the means 
40 by which a targe network can be run by a group of controllers operating in parallel and independently ot 
gne another. The distributed control mechanism is described in terms of two-stage networks, but with a 
scheme to extend the approach to multistage networks* Section 3,3 present details of the specific design for 
MAM. 

A major advantage of our approach is that the plurality of network controllers operate independently of 
45 one another using only focal information, Throughput (measured in transactions) is increased because 
controllers do not burden each other wiih queries and responses. Also the delay in setting .up or tearing 
down connections is: reduced because the number of sequential control steps is mrnim&ed*: AH this is 
possible because the network fabric is partitioned into disjoint subsets* each of which i$ controlled solely by 
its own controller that uses global static information, such as the internal connection pattern of the data 
so network 129. but only local dynamic (network state) daia. Thus, each controller sees and handles only those 
connection requests that use the portion Of the network for which it is responsible, and monitors the state of 
onty that portion. 

$$ 3,2 J Partitioning Two-Stage Networks 

Consider the 9 x 9 two-stage network example in PIG. 6 comprising three input switches IS1 (101), IS2 
(102). and !$3 (103), and three output switches OSl (104), OS2 {10$};. and OS3 (106). We can partition its 



3.2 General Approach 



A Distrubufed-Control Circuit-Switching Network 
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fabric into three disjoint subsets. Each subset includes the fabric in a given second stags switch (OS*) plus 
the fabric (or crosses) in the first stage switches (tS y ) that connect to the links going to that second 
stage' switch. For example, in FIG, 6, the partition or subset associated with OS* (104) is shown by a 
dashed line around the crosspoints in OSi plus dashed lines around three crosspoints m each of the first 

s stage switches {101 ,1 02,103) (those crosspoints being those that connect to the links to OSi )• 

Now, consider a -controller for this subset of the network. It would be responsible for connections from 
any inlet to any outlet on OS*. The controller would maintain busy/idie status for the crosspoints it 
controlled. This information is clearly enough to (all whether a connection is possible, For example, suppose 
m inlet on iSi is to be connected to an outlet on OSy, We assume that the request is from the inlet, which 

w must be idle. The outlet can be determined to be idle from outlet busy/idle status memory or else from the 
status of the outlet's three crosspoints in OS: (ail three must be idle). Next, the status Of the link between 
fS, and OSi must be checked. This fink will be idfe If the two crosspoints on both ends of the link, which 
connect the link to the remaining two inlets and outlets, are all idle, if the inlet outlet* and link are alt idle, a 
crosspoint in each of ISi and OSi can be closed to set up the requested connection. 

is Note that this activity can proceed independently of activities in the other subsets (disjoint) of the 
network. The reason is that the network has onty two stages, so the inlet switches may be partitioned 
according to their links to second stage switches. In theory this approach applies to any two-stage network, 
but the usefulness of the scheme depends on the network's blocking characteristics, "Hie network in FIG. 6 
would block too frequently, because it can connect at most one inlet on a given inlet switch to an outlet on 

so a given second stage switch. ^ 

A two-stage network, referred to hereinafter as a Richards network, of the type described in a W. 
Richards eta at.: n A Two-Stage Rearrangeabte Broadcast Switching Network, IEEE Transactions on 
Communications! v, COM 33. no. 10, October 1985, avoids this problem by wiring each inlet port to multiple 
appearances 'spread over different inlet switches. The distributee} control scheme operates on a Richards 

as network, even though MAN may not use such Richards network Matures as broadcast and rearrangement 



3,2.2 Control Network 



3.2.2,1 Function 

in MAN, requests for connections come from inlets, actually, the central control 20 of the MINTs. These 
requests must ba distributed to the proper switch controller via a control network (CNet).- in FIG. 7, both: the 
DNet 120 for circuit-switched transactions and the control CNet 130 are shown. The DNet is a two-stage 
rearrangeably non-blocking Richards network. Each switch 121,123 Includes a rudimentary crpsspoint 
controller {XPO} 122,124 which' accepts commands to connect a specified inlet on the switch to a specified 
outlet by closing the proper crosspoint. The first and second stages' XFCs (121,123} are abbreviated 1SC 
(first stage conirclfsr) and 2SC (second stage controller) respectively. 

Qn th? . ri g ht side of !)w CNet are 64 MANS controllers 140 (MANSCs) corresponding to and controlling 
64 disjoint subsets of the DNet, partitioned by second stage outlet switches as described earlier. Since the 
controllers and their network are overlaid on the DNet and not integral to the data fabric, they could be 
replaced by a single. controller in applications where transaction throughput is not critical. 



3^,2.2 Structure 

The CNet shown in FIG. 7 has special properties, it consists of three simitar parts 130,134,135, 
corresponding to flows of messages from a MINT to a MANSC, orders from a MANSC to an XPC, and 
acknowledgments or negative acknowledgment ACKs/NAKs from a MANSC to a MINT; acknowledge 
(ACK), negative acknowledge (NAK). Each of the networks 130,134 and 135 is a statistically multiplexed 
time*division switch, and comprises a bus 132, a group of interfaces 133 for buffering control data to a 
destination or from a source, and a bus arbiter controller (BAG) 131. The bus arbiter controller controls the 
gating of control data from an input to the bus. The address of the destination selects the output to which 
the bus is to be gated. The output is connected to a controller (network 130; a MANSC 140) or an interface 
(networks 131 and 132, interfaces similar to interface 133), The request inputs and ACK/.MAK responses are 
concentrated by control data concentrators and distributors 136,138, each centre* data concentrator 
concentrating data to or from four MiNT$ t The control data concentrators and distributors simply buffer data 
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from or to the MfNTs. The interfaces 133 in the CNet handle statistical demultiplexing and multiplexing 
{steering and merging) of control messages. Note thai the interconnections made by bus 132 for a givers 
request message in She DNet are the saw" as 'those requested in the CNet. 



3.2.3 Connection Request Scenario. 

The connection request scenario begins with a connection request massage arriving at the left of CNet 
130 in a multiplexed stream on one of the message input links 137 torn one of the data concentrators 136. 
w This request includes the DNet 120 iniet and outlet to be connected. In the CNet 130, the message is 
routed to the appropriate link 1.39 on the right side of the .CNet according to the outlet to be connected, 
which is uniquely associated with a particular second stage switch and therefore also with a particular 
MANS controller 140. 

This MANSC consults a. static global directory (such as a ROM) to find which first stage switches carry 
75 the requesting inlet. Independently of other MAiMSCs, it now checks dynamic locaf data to see whether the 
outlet is idle and any finks from the proper first stage switches are idle. I? the required resources are idle, 
the MANSC sands a crosspoint connect order to its own second stage outlet switch pfos another order to 
the proper first stage switch via network 134. The latter .order includes a header to route it to the correct 
first stage. 

20 This approach can achieve extremely high transaction throughput for several reasons, AH network 
controllers can operate in parallel, independently of one another, and nee&.not watt for one another's data or 
go-aheads. Each controller sees' only those requests for which it is responsibly and does not waste time 
with other messages Each controller's operations are inherently sequential and Independent functions and 
thus may fce pipelined with more than one request in progress at a time, 

25 The above scenario is not the only possibility. Variables to be considered include broadcast ~vs- point- 
to-point inlets, outlets -vs~ inlet-oriented connection r$quests, rearrangement -v$- blocking-allowed opera* 
(ton, and disposition of blocked or busy connect requests. Although, these choices are already settled for 
MAN, all these options can be handled with the control topology presented, simply by changing the logic in 
the MANSCSv 

3.2.4 Multistage Networks 

This control structure is extendible to multistage Richards networks, where switches in a given stage 
35 are recursively implemented as two-stage networks. The resultant CNet is one in which connection requests 
pass sequentially through S~t controllers in an S-stage network, where again, controllers are responsible for 
disjoint subsets of the network and operate independently, thus retaining the high throughput potential. 



40 3.3 Specific Design for MAN 

In this section we first examine those system attributes that drive the design of the MANS, Next, the 
data and control networks are described, Finally the functions of the MANs controller are discussed in 
detail, including design tradeoffs that affect performance. 

3.3.1 System' Attributes 



50 3,3.1 .1 Externa l and Internal Interfaces 

7 illustrates a prototypical fully-grown MANS composed of a DNet: 121: with 1024 incoming and 
1024 outgoing ILs and CNet 22 comprising three control message networks 130,133,134 each with .64 
incoming and 64 outgoing message Jinks. The Its are partitioned into groups of 4, one group for each of 
55 256 MINTs. The DNet is a two-stage network of 64 first stage switches 121 and 64 second stage switches 
123. Each switch includes an XPC 122 that takes commands to open and close crosspoints. For each of the 
DNefe 64 second stages 123* there is an associated MANSC 140 with a dedicated control fink to the XPC 
124 in its second stage switch. 
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Each control link and status, link Interfaces 4 MINTs to the CNefs left-to-right and nghMofeft switch 
planes vfe 4:i control data concentrate and distributors 136,13d which are also part of the. CNet: 22. These 
may be regarded either as remote concentrators in each 4-MI'NT group or as parts of their associated 1:64 
CNst 130,135 stages; in the present embodiment, they ars part of the CNet. A third 64x64 plane 134 of the 
5 CNst gives each MANSC 140 a dedicated rtghNo-teft Interface 133 with one link to each of the 64 1SCs 
122. EacH'MINT 11 interfaces with the MANS 10 through its four ILs 1.2, its request signal to control data 
concentrator 136, and the aknowledge signal received back from control data distributor 138. 

Alternately* each CNet. could have 256 instead of 64 ports on its MINT side, eliminating the concentra- 
tors. 

10 

3.3,1,2 Size 

The MANS diagram in Fl& 7 represents a network needed to switch data traffic for up to 20.000 EUSs, 
rs £ach HIM is expected to handle and concentrate the traffic of 10 to 20 EUSs onto a 150 Mb/s XL giving 
about 1000 XL? (founded off in binary to 1024). Each MINT serves 4 XLs for a total of 256 MlNTs, Each 
MINT also handtes 4 [Ls, each with an input and an output termination on the DNet portion of the MANS. 
The data network Lhus has 1024 inputs and 1024 outputs* Internal DNet link sizing will be addressed later. 
Failure-group stee and other considerations tead to a DNet with 32 input links on each first stage switch 
2& 121 > each of which links is connected to two such switches. There are 1'6- outputs on each second £tage 
switch 123 of the DNet, Thus, there are 84 of each type of switch and; also 64 MANSCs 140 in the CNet, 
one per second stage switch. 

25 3.3,1,3 Traffic 1 an d Consolidation 

The h natural" EUS transactions of data to be switched vary m size by several orders of magnitude, .from 
SUWUs of a few hundred bits to LUWUs a megabit or more. As explained in Section 2.1.1, MAN breaks 
larger EUS transactions into network transactions or packets of at most a few thousand bits each. But the 

W MANS deals with the switch transaction, defined as the burst of data that passes through one MANS 
connection per one connect (and disconnect) request. Switch transactions can vary in $iz& from a single 
SUWU to several LUWUs (many packets) for reasons about to be given. For the rest of Section 3, 
"transaction" means, "switch transaction" except as noted. 

For a given total dala rate through the MANS, the transaction throughput rate (transactions/second) 

55 varies inversely with the transaction sfetf. Thus, the smaller the transaction size, the greater the transaction 
throughput must be to maintain the data rate. This throughput is limited by the individual throughputs of the 
MANSCs (whose connect/disconnect processing delays reduce the effective II bandwidth) and also by 
concurrency resolution (waiting for busy outlets). Each MANSCs overhead per transaction is of course 
independent of transaction size. 

40 Although larger transactions reduce the transaction throughput demands, they will add more delays to 
other transactions by holding outlets and fabric paths for longer times, A compromise is needed - small 
transactions reduce blocking' and concurrency delays, but large transactions ease the MANSC and MINT 
workloads and Improve the DNet duty cycle- The answer is to let MAN dynamically adjust its transaction 
sizes under varying loads for the best performance; 

4$, The DNet is large enough to handle the offered load, so the switching control complex's (SCC) 
throughput is the limiting factor. Under light traffic, the switch transactions will be short, mostly single 
SUWUs and packets. As traffic levels increase so doas the transaction rate. As the SCC transaction rale 
capacity Is approached, transaction sizes, are dynamically increased to maintain the transaction rate- just 
below trie point where the SCO would overload. This is achieved automatically by the consolidation control 

50 strategy > whereby each MINT always transmits m a single switch transaction ail available SUWUs and 
packets targeted for a given destination, even though each burst may contain the. whole or parts of several 
EUS transactions. Further increases in traffic will increase ih© size, but not so much the number, of 
transactions. Thus fabric and IL utilization improve with load, while the SCC's workload increases only 
slightly. Section 3,3.3.2J explains- the feedback mechanism that controls transaction sfe& 

3.3.1 A Performance Goals 
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Nevertheless MAN'S data throughput depends on extremely high performance of individual SCC control 
elements. For example, each XPC 122,124 m the data switch wifj be ordered to set and clear at least 
07,000 corrections per second. Clearly, sach request must be handled in at most a few microseconds. 

Likewise, the MANSCs' functions must be done quickly. We assume that these steps will be pipelined; 
$ then the sum of the step processing times will contribute to connect and disconnect delays, and the 
maximum of these- step times will limit transaction throughput. We aim to hold the maximum and sum to a 
few mjcrosecoads and. a few tens of microseconds, respectively. 

The resolution of the concurrency problem must also be quick and efficient. Busy/idle state of 
destination terminals will have to be determined in about 6 microseconds, and jhe conW strategy must 
io avoid burdening MANSGs with ynfulfitlabfe connection requests. 

One final performance issue relates to the CMet itself; The network and Its access links must run at high 
speeds (probacy at least 10 Mb/s) to keep control message transmit times small and so that links will run 
at tow occupancies to minimize the contention delays from statistical multiplexing. 

15 

332 Data Network (DNet) 

The ONet is a Richards two-$tage rearrangeably non-blocking broadcast network- TOs topology was 
chosen not so much for its broadcast capability, but because its two-stage structure allows the network to 
20 be partitioned mto disjoint subsets for distributed control, 

3.3.2.1 Design Parameters 

25 The capabilities of the Richards network derive from the assignment of Inlets to multiple appearances 
on different first stage switches according to a definite pattern. The particular assignment pattern chosen, 
the number m of 'multiple appearances per inlet the total number of inlets, and the number of finks between 
first and second stage switches determine the maximum number of outlets per second stage switch 
permitted for the network to be rearrangeably non-blooking, 

30 The DNet in FIG. ? has 1024 miets; each with, two appearances on the first stage switches; There are 
two links: between each first and second stage switch. These parameters along with the pattern of 
distributing the inlets ensure that with 16 outlets per second stage switch the network will be. rearrangeabiy 
nonlocking for broadcast 

Since MAN does not use broadcast or rearrangement those parameters not justified by faitore-group or 

re other considerations may be changed as more experience is obtained. For example, if a failure group sise 
of 32 were deemed tolerable, each second stage switch could have 32 outputs, thus reducing the number 
of second stage switches, by a factor of 2. Making such a change would depend on the ability of the SCC 
control elements each to handle twice as much traffic. In addition, blocking probabilities would increase and 
it would have So ba determined that such an increase would not significantly detract from the performance 

4o of the network; 

The network has 64 first stage switches 121 and 64 second stage switches 1 23* Since each inlet has 
two appearances and there are two links between first and second stage switches, each first stage switch 
has 32 inlets and 128 outlets emd each second stage, has 128 inlets and 16 outlets. 

js 

3.3.2.2 Operation 

Since esch inlet has two appearances and stoce there are two links between each first and second 
stage switch, any outlet switch can access any inlet on any one of four links. The association of inlets to 
so links is algorithmic and thus may be computed or alter natively read from a table. The path hunt Involves 
simply choosing anvidie link (if one exists) from among .{ho four link possibilities. 

If none of the four links is Idle* a re.-attempt to make a connection i$ made later and is requested by the 
same MINT. Alternatively, existing connections could be re-arranged to ■ remove the blocking condition t a 
simple procedure in a Richards network. However, rerouting a connection iri midstream couid introduce a 
55 phase glitch beyond the outlet circuit's ability to recover phase and clock. Thus with present circuitry, it is 
preferabfe not to run the MANS as a rearrangeabfe switch. 

Each switch in the DNet has an XPC 122,124 on the CNel which receives messages from the MANSCs 
telling which crosspoints to operate. No high-level logic is performed by these controllers. 
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3,3,3 - Control Network and MANS Controller Functions 



3,3.3,1 Control Network {CNet) 

the CNet 130,134,135 briefly described earlier, interconnects the MINTs, MANSCs, and tSCs. IX must 
carry three types of .messages -connect/disconnect orders from MINT* -to MANSCs using block 130, 
crosspolnt orders from MANSCs to 1SCs using* block 134, and ACKs and NAKs from MANSCs back to the 
MINTs using block 135, The CNet shown in Ft& 7 has three corresponding planes or sections. The private 
MANS 14G-2SC 124 [Inks are shown but are not considered part of the CNet as nq> switching is required. 

in this embodiment, the 256 MINTs access the CNet in groups of 4, resulting in 64 input paths to and 
64 output paths from the network. The bus elements in trie control network perform merging and routing of 
message streams, A -request message from a MINT includes the ID of the outiet port to be connected or 
disconnected. Since the MANSCs are associated one-to-one with second stage swiiches, this outiet 
specification identifies the proper MANSC to which the message is routed. 

The MANSCs transmit acknowledgment (ACK), negative acknowledgment (NAK), and 18C command 
messages via the right-to-left portion of the CNet (blocks 134,135), These messages will also be formatted 
with header information to route the messages to the specified MINTs and 1SCs. 

The CNat and its. messages raise significant technical Challenges. Contention problems in the CNet 
may mirror Shose of the entire MANS, requiring their own concurrency solution. These are apparent in the 
Control Network shown in FIG. 7, The control data concentrators 136 from four lines into one interface may 
have contention 1 where more than one message tries to arrive at one lime. The data concentrators 138 have 
storage for one request from each of tha four connected MINTs, and the MINTs ensure that consecutive 
requests are sent sufficiently far apart that the previous request from a MiNT has already; been passed on 
by' the concentrator before the next arrives. The MiNTs time out if no acknowledgement of a request fe 
received within a prespecifled time. Alternatively, the control data concentrators 13$ could simply "OR* any 
requests received on any input to the output; garbled requests would be ignored and not acknowledged, 
leading to a time out. 

Functionally what Is needed inside the blocks 130,134,135 is a micro-LAN specialized for tiny fixed- 
length packets and tow contention and minimal .delay, Ring nets are easy to interconnect, grow gracefully, 
and permit simple tokenless add/drop protocols, but they are ill-suited for so many ciosely packed nodes 
and have intolerable end-to-end delays. 

Since the longest message (a MINTs connect order) has under 32 bits, a parallel bus 132 serves as a 
CNet fabric that can send a complete message in one cycle, its arbitration controller 131, in handling 
contention for the bus. would automatically solve contention for the receivers. Bus components are 
duplicated for reliability (not shown). 



3.3.3.2 MAN Switch Controller (MANSC) Operations 



FIQS. 8 and 9 show a flowchart of the MANSC's High level functions. Messages to each MANSC 140 
include a connect/disconnect bit, $UWU/packet bit, and the IDs of the MANS input and output ports 
involved 



3.3.3.2.1 Bequest Queues: Consolidation (Intake Section. FI& S) 

Since the rate of message arrivals at each MANSC 140 can exceed its message processing rata, a 
MA.NSC provides entrance queues for its messages. Connect and disconnect requests are handled 
separately. Connects are not enqueued unless their requested outlets are idle. 

Priority and regular packet connect messages are provided separate queues 150,152 so that priority 
packets can be given higher priority. An entry from the regular packet Queue 152 i$ processed only if the 
priority queue 150 is empty. Thfs minimizes the priority packets' processing delays at the expense of the 
regular packets', but it is estimated that priority traffic will not usually be heavy enough to add much to 
packet delays. Even so, delays are likely to be more useMaterabie with tine lower priority large data 
transactions than with priority transactions. Also, if a packet is one of many pieces of a LUWU, any given 
packet delay may have no final effect since end-to-end LUWU delay depends only on the last packet 

Both the priority and regular packet queues are short, intended only to cover short-term random 
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fluctuations in message arrivals. -If the short-term rate of arrivals exceeds the MANSC's processing rate, the 
regular packet queue and perhaps the priority queue will overflow, in such cases a control negative 
acknowledge (CNAK) is returned to the requesting MINT, Indicating a MANSC overload* This fe not 
catastrophe, but rather the feedback mechanism in the consolidation strategy that increases switch 

5 transaction sszes as traffic gets heavier. Each MINT combines into one transaction all available packets 
targeted for a given DNet outlet Thus, if a connection request by the MINT results in a CNAK, the next 
request for the same destination may represent more data to be shipped during the connection, provided 
more packets of the LUWUs have arrived at the MINT in the meantime. Consolidation need not always add 
to LUWU transmission delay, since a LUWU's last packet might not be affected, This scheme dynamically 

re? increases effective packet (transaction) sizes to accommodate the processing capability of the. MANSCs. 

The priority queue is longer than the regular packet queue to reduce the odds of sending a priority 
CNAK due to random bursts of requests. Priority packets are less likely to benefit from consolidation than 
packets recombming into their original LUWUs; this supports the separate, high-priority queue. To force the 
MiNTs to consolidate more packets, we may build the regular packet queue shorter than it "ought* to be, 

is Simulations have indicated that a priority queue of 4 requests capacity and a regular queue of a requests 
capacity is appropriate. The sizes of both queues affect system performance and can be fine-tuned with 
real experience with a system. 

Priority is determined by a priority indicator in the type of service indication 623 (FIG, 2D), Voice 
packets are given priority because of their required low delay, to alternative arrangements, all single packet 

20 transactions "(SUWUs) may be given priority. Becau.se charges are likely to be higher for high priority 
service, users wilt be discouraged from demanding high priority service for the many packets of a long 

luyvll 



25 3,3,3,2.2 3usy/tdle Check 

When a connect request first arrives at a MANSC, it is detected in test 153 which differentiates it from a 
disconnect request. The busy/idle status of the destination outlet is checked (test 154). If the destination is 
busy, a busy negative acknowledge (BNAK) is returned (action to the requesting MINT, which wiN try 

30 agsta later. Test 158 selects the proper queue, {priority or regular packet). The queue is tested (160,182) to 
see if it is full. If the specified queue is full, a CNAK (control negative aknowledge} is returned (action 164), 
Otherwise the request is enqueued in queue 150 or 152 and simultaneously the destination is seized 
(marked busy) {action 166 or 167), Note that an overworked #uit queues) MANSC can still return BNAKs, 
and that both BNAKs and CNAKs tend to increase transaction sizes through consolidation, 

3$ The busy/idle check and BNAK handle the concurrency problem, The penalty paid for this approach is 
mat a MINT-to-MANS iL is unusable during the interval between a MINT'S issuing a connect request for that 
IL and its receipt of an ACK or BNAK. Also the CNet jams up with SNAKs and failing requests under heavy 
MANS loads. Busy/idle checks must be done quickly so as not to degrade the connection request 
throughput and IL utilization; this explains the performance of a busy test before enqueuing, it may be 

« desirable further to. use separate hardware to pretest outlets for concurrency. Such a procedure would 
relieve the MANSCs and CNets/from repeated BNAK requests, increase the successful request: throughput, 
and permit the MANS to saturate at a higher percentage of Its theoretical aggregate bandwidth. 



45 3,3,3-2.3 Path Hunt - MANSC Service Section (PIG, 9) 

Priority block 168 gives highest priority to requests from disconnect queue 170, iower priority to 
requests from the priority queue 150, and lowest priority to requests from the packet queue 152. When a 
connect request is unloaded from the priority or the regular packet queue, its requested outlet port has 

so already been seized: earlier {action 166 or 167), and the MANSC hunts for a path through the DNet This 
merely involves looking Up first the two inlets to which the incoming IL is connected (action 172) to find the 
tour links with access to that incoming IL and checking their busy status (test 174), If all four are busy, a 
blocked-fabric NAK (fabric NAK or FNAK) fabric blocking negative acknowledge (FNAK) is returned to the 
requesting MINT, which win try the request again .later (action 178), Also the $efced destination outlet i$ 

55 released (marked idle) (action 1 7$). We expect FNAKs: to be rare. 

if the four links are not all busy, en idle one is chosen and seised, first, a first stage inlet, then a link 
(action 180); both are marked busy (action 182), The Inlet and link choices are stored {action 1.84), Now the 
MANSC uses its dedicated control path to send a crosspoint connect order to the XPC in its associated 
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second stage switch (action 188); this connects the chosen link to the outlet. At the same time another 
crosspoint. order is sent (via the righNcrtWt CNet plane 134) to the 1SC (action m) required to connect the 
link to the inlet port. Once this order arrives at the ISC (test 190), an ACK is returned to the originating 
MINT (action 102). 



3.3.3.2.4 Disconnects 

To release network resources as qufckiy as possible, disconnect requests are handled separately from 
connect requests and at top priority; They have a separate queue 170, built 16 words long (same as the 
number of outlets) so It can never overflow, A disconnect \s. detected in test 153 which receives requests 
from the MINT and separates connect from disconnect requests. The outlet is released and the request 
placed in disconnect queue 170 (action 193). Now a new connect request for this same outlet can be 
accepted even though the outlet- Is not yet physically disconnected. Due to its higher priority, the 
disconnect will tear down the switch qonnections before the new request tries to reconnect the outlet. Once 
enqueued, a disconnect can always be executed. Only the oottet tD is needed to identify the spent 
connection; the MANSC recalls this connection's choice of link and crosspoints., from local memory (action 
195), marks these links Jdie (action 198) and sends the two XPC orders to release them (actions 186 and 
168)! Thereafter, test 190 controls the wait for ah acknowledgment from the first stage controller aid the 
ACK is sent to the MINT {action 102). If there is no record of this connection, the MANSC returns a "Sanity 
NAJC The MANSC senses status from the outlet's phase alignment and scramble circuit (PASC) 290 to 
verify that some data transfer took piace. 
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Except for seizure and release of resources, the above steps for one request are independent of other 
requests' steps in the same MANSC and thus are pipelined, to increase MANSC throughput Sttil more 
power Is achieved through parallel operations; the path hunt begins at the same time as the busy/idle 
30 check. Note that the transaction rate depends on the longest step in a pipelined process, but the response 
time for one given transaction (from request to ACK or NAK) Is the sum of the step times Involved. The 
latter is : improved by parallelism but not by pipelining. 



55 3-3.4 Arc* Oatection and Diagnosis 

Costly hardware, message bits, and time-wasting; protocols to the CNet and its nodes to verify every 
Httle message are avoided. For example, each crosspoint order from a : MANSC to an XPC does not require 
an echo of the command or even an ACK in return, instead, MANSCs does assume that messages arrive 

40 uncorrupted and are acted on correctly, until evidence to the contrary arrives from outside. Audits and 
cross-checks are enabled only when there is cause for suspicion. The end users, NIMs and Ml NTs soon 
discover, a defect in the MANS or its control complex, and identify the subset of MANS ports involved, Tnen 
the diagnostic task is to isolate the problem for repair and inierim work-around. 

Once a portion of the MANS is suspect, temporary auditing modes could be turned on to catch the 

4$ guilty parties. For suspected 1SCs and MANSC, these modes require use of the command. ACKS and 
echoing. -Special messages such as crosspoint audits may also be passed through the CNet, This should 
be done while still carrying a Hgta toad of user traffic 

Before engaging these internal self-tests {or perhaps to . eliminate them entirely), MAN can run 
experiments on the MANS to pinpoint the failed circuit, using the MINTs, ILs, and NIMs. For example, if 

so 75% of the test SUWUs sent from a given IL make it to a given outlet, we would conclude that one of the 
two links from one of that IUs two first stages is defective. (Note this test must be run under ioad, lest the 
deterministic MANSC always select the same link,) Further experiments cm isolate that link, Sut if several 
MINTs are tested and none can send to a particular outlet then that outlet is marked "out of service* to ait 
MINTs and suspicion is now focussed on that second stage and its MANSC, If other outlets on mat stage 

55 work, the fault is in the second stage's fabric These tests use tie status lead from each of a MANSC's 16 
PASC* 

Coordinating the independent MINTs and NiMs to run these tests requires a central intelligence with 
low-bandwidth message links to all MINTs and NIMs. Given inter-MINT connectivity (see FIG- 15), any 
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MINT with tbe needed firmware, can take on a diagnostic task. NIMs must be Involved anyway to teil 
whether test* SUWJs reach their destinations. 01 course any NiM on a working U\HT can exchange 
messages with any other such NIM. 

s 

3.4 MAN Switch Controller 

FIG. 25 is a diagram of MANSC 140. This is the unit which sends control instructions to data network 
120 to set up or tear down circuit connections. It receives orders from control network 130 via link 139 and 
w sends acknowledgments both positive and negative back to the requesting Ml NTs 11 via control network 
135, It also sends instructions to first stage switch controllers vfa control network 134 to first stags switch 
controller 122 and directly to the second stage 'controller 124 that is associated with the specific MANSC 
140. 

Inputs ars received from Infct 139 at a request intake port M0£ They are processed by Intake control 

i5 1404 to see if the requested outlet Is busy. The outlet memory 1406 contains busy/idle indications of the 
outlets for which an MANSC 140 is responsible. If the outlet is idle a connect request is placed into one of 
two queues 150 and 15.2 previously described with respect to FIG, B, if the request Is for a disconnect the 
request is placed in disconnect queue 170, The outlet map 1406 is updated to mark a disconnected outlet 
fdle. The acknowledge response unit 1408 sends negative acknowledgments if a request is received with an 

20 error or if a connect request is made to a busy outlet or if the appropriate queue 150 or 152 is full 
Acknowledgment responses are sent via control network 135 back to the requesting MINT 11 via distributor 
13& All of these- actions are performed under the control of intake control 1404, 

Service conlroi 1420 controls the setup of paths in data network 120 and the updating of outlet memory 
140'6 for those circumstances in which no path is available in the data network between the requesting input 

as link and an available output link. The intake control also updates outlet memory 1406 on connect requests 
so that a request which is already in the queue will block another request for the same output link. 

Service control 1420 examines requests In the three queues 150, 162* and, 170. Disconnect requests 
are always given She highest priority. For disconnect requests, the link memory 1424 and path memory 
1456 are examined to see which itnks should be made idle. ThB Instructions for idling these links are sent 

so to first stage switches from first stag'e switch order port 142B and the instructions* to second stage switches 
ars sent from second stage swtich order port 1430. For connect requests, the static map 14£2 is consulted 
to see which links can be used to set up a path from the requesting input link to the requested output link. 
Link map 1424 is then consulted to see if appropriate links are available and if so these links are marked 
busy. Path memory 1426 is updated to show that this path has been set up so that on a subsequent 

3S disconnect order the appropriate links can be made idle. AH of these actions are performed under the 
control of service control 1 420. " 

Control&rs 1420 and 1404 may be a single controller or separate controllers and may be program 
controlled or controlled by sequential logic. There is a great need for a very high-speed operation in these 
controllers because of the high throughput demanded which makes a hard wired controller preferable. 

40 

&5 Control Network 

Control, message network 130 (FIG, 7) takes outputs 137 from data concentrators 136 and transmits 
45 these outputs, representing connect or disconnect requests, to MAN switch controllers 1 40. Outputs of 
concentrators 138 are stored temporarily in source registers 133. Bus access controller 131 polls these 
source registers '133 to see if any have a request to be transmitted. Such requests are then placed on bus 
132 -whose output is stored temporarily in intermediate register 141. Sus access controller 131 then sends 
outputs from register 141 to the appropriate one of the MAN switch controllers 140 via Jink 139 by placing 
$0 the output of register 141 on bus 142 connected to link 139, The action is .accomplished in three phases. 
During the first phase, the" output of register 133 is placed on the bus 132, thence gated to register 141. 
During the second phase, the output of register 141 is placed on bus 142 and delivered to a MAN switch- 
controller 140. During the third phase, the MAN switch controller signals the source register 133 as to 
whether the controller has received the request; if so t source register 133 can accept a new input from 
55 control data concentrator 136. Otherwise, source register 133 retains the same request data and the bus 
access controller 131 will repeat the transmission later, The three phases may occur simultaneously for 
three separate requests. Control networks 134 and 135 operate in a fashion similar to control network 130. 
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3*6 Summary 

A structure to meet the large bandwidth, and transaction throughput requirements for the MANS has 
been Ascribed. The data switch fabric Is a two-stage Richards network, chosen because its low blocking 
5 probability permits a parallel, pipelined distributed switch control complex (SCC). The SCO includes XPCs 
in atMirst and-second stage switches, an intelligent controller MANSC with each second .stage, and the 
CNet that ties the control pieces together and links them to the Ml NTs, 



The memory and interface module (MINT) provides receive interfaces for the external fiber-optic Onks, 
buffer memory, contra* for routing and iinfc protocol*, and transmitters to send collected data over the links 
to the MAN switch- in the present design, each MINT serves four network interface modules (NiMs) and has 
is. four links- to the switch, The MINT is a data switching module, 



4,1 Basic Functions 

Trie basic functions of the MINT are to provide, the following: 

1. A fiber-optic receiver and link protocol handler for each NIM. 

2. A link handier and transmitter for each Jink to the switch. 

3. A buffer memory to accumulate packets awaiting transmission across the switch. 

4. An interface to the controller for the switch to direct the setup and teardown of network paths. 

5 T Control- fcf address translation, routing, making efficient use of the switch, orderly transmission of 
accumulated packets and management of buffer memory. 

3. An interface lor operation, administration, and maintenance of the overall system, 

7. A control channel to each NIM for operation, administration, and maintenance functions- 



4.2 Data Flow 



In order to understand the descriptions of the individual functional units that make up a MINT, it is first 
35 necessary to have a basic understanding of the genera.], flow of data and control FIG- 10 shows an overall 
view of the MINT Data enters the MlMT on a high-speed {100-150 Mbit/s) data channel: 3 from each NIM. 
This data is in the form of packets, on the order of 8 Kilobits long, each with its own header containing 
routing information. The hardware allows for packet sizes in increments of 512 bits to a maximum of 12B 
Kilobits, Small packet sizes, however, reduce throughput due to the per^packet processing required. Large 
40 maximum packet sizes result in wasted memory for transactions of less- than a maximum size packet The 
link terminates on an external link handfer 16 fXLM), which stains a copy of the pertinent header fields as it 
deposits the entire packet into the buffer memory, This header information, together with the buffer memory 
address and length, is then passed to the central control £0, The central control determines the destination 
NIM from the address and adds this block" to the list of blocks (if any) awaiting transmission to this same 
45 destination. The central control also sends a connection request \o the switch controller it there is not 
already a request outstanding. When the central control receives an acknowledgement from the switch 
controller that a connection request has been satisfied, the central control transmits the list of memory 
bfcete to the proper internal link handler 1? (1LH). The- ILH reads the Stored data from memory and 
transmits it at high speed {probably the same speed as the incoming links) to the MAN: switch, which 
directs it to Its destination. As the blocks are transmitted, the W informs the central control so that the 
blocks can - be added to the list of free blocks available for use by the XlHs. 



BO 



43 Memory Modules 

55 

The buffer memory 13 {FIG. 4) of the MINT ft satisfies three requirements: 
1. The quantity of memory provides sufficient buffer space to hold the data accumulated (for all 
destinations) while awaiting switch setups. 
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2. The' memory bandwidth is adequate to- support simultaneous activity on aii eight links (four 

receiving and four transmitting). 

3, The memory access provides for efficient streaming of data to and from tha link handlers. 



4,3,1 Organization 

Because of the amount of memory required {Megabytes), It Is desrrabb to employ conventional high- 
density dynamic random access memory (DRAM) parts, Thus, high bandwidth can be achieved only by 
making the memory wide. The memory is therefore organized into 16 modules 2G1..,„202 which make up a 
composite 512-bit word. As will be seen below, memory accesses m organized in a synchronous fashion 
so that no module ever receives successive requests without sufficient time to perform the required cycles, 
The range of memory for one MINT 11 In a typical MAN application is Mbytes, The number is 
sensitive to the speed of application of flow control in overbad situations. 



4.3.2 Time Slot Assignors 

Th$ time slot assignee 20G,...,204 (TSAs) combine the functions of a conventional DRAM cdntrolfer and 
a specialized fc-channe! DMA controller, £ach receives read/write requests from Igtflc associated with the 
Data Transport Ring 19 (see §4 A beiow). its setup commands come from defeated control time slots on 
this same ring. 



4,3.2.1 Control 

From a control viewpoint, the TSA appears as a set of registers as shown in FIG. 11, For each XLH 
there is an associated address register 210 and count register 211,. Each tLH aiso has address 210 £nd 
count 214 registers, but .in addition has registers containing tha next address 215 and count 216. thus 
allowing a series of blocks to be read from memory in a continuous stream with no inter-block gaps. A 
special set of registers 220-226 allows the MINTs central controS section to access any of the internal 
registers in the- TSA or to perform a directed read or write of any particular word in memory. These 
registers include a -write data register 220 and read data register 221, a memory address register 222, 
channel status register 223, error register 224, memory refresh row address register 225, and diagnostic 
control register 226. 



4.3.2.2 Operation 

in normal operation, the , TSA: 203 receives only four order types from the .ring interface logic: (1) "write" 
■requests for data received. by an XLH, {2) "read* requests for an ILH, {3) "new address 1 ' commands issued 
by either an XLH or an iLH, anci (4) "idie cycle" indications which tell: the TSA to perform a refresh cycle or 
other special operation. Each order is accompanied by the Identity of the link handler involved and, in the 
case of "write" and "new address" requests, by 32 bits of data. 

For a ''write" operation, the TSA 203 : simply performs a memory write cycle using the address from the 
register associated with the Indicated XLH 16 and the- data provided by the ring interface logic, it then 
increments the address register and decrements the count register. The count register & used in this case 
only as a safety check since the XLH should provide a new address before overflowing the current block, 

For a "read* operation, the TSA 203 must first check whether the channel for this ILH is active. If it is t 
the TSA performs a memory read cycle using the address from the register for this ILH 17 and presents the 
data to the ring interface iogia It also Increments the address register and decrements the count register, In 
any case, the TSA provides the interface logic with two %g fl bits which indicate (1) no data available, (2) 
data available, .(3) first word of packet available, or {4) last word of packet available. For case (4), the TSA 
will load the ILH's address .214 and count 213 registers from Its "nsxt address" 21 S and "next count" 215 
registers, provided: that these registers: have been loaded by the. ILH. If they have not, the TSA marks the 
channel Inactive/' 

From the above descriptions, the function of a "new address" operation can be inferred. The TSA 203 



26 



BP 0 335-562 A2 



receives the link Identity, a 24-bit address, and ari frWt count. For an ; XLH 16, It srmply loads the associated 
registers, In the case of an ILH 17, the TSA must check whether the channel is active. If ti is not, then the 
normal address 214 and count 213 registers are loaded and the channel is marked active. If the channel is 
currently active, then the *ne*t address" £16 and "next count" 215 registers must be loaded instead of the 
s norrnaJ address and count registers. 

to an alternative embodiment, the two tag bits, are also stored in. buffer memory 201 202, Advanta- 
geously, this permits packet sizes that are not limited to being a multiple of the overall width of the memory 
(512 bits), in addition, the ILH 17 need not provide the actual length of tte packet when reading it, thus 
relieving the central control ZQ of the need to pass- along this information to the ILH. 



4.4 Data Transport Ring 

It is the job of the Data Transport Ring 10 to carry control commands and high-speed data between the 

t$ link handlers 19,17 and the memory modules £01 202. The ring provides sufficient bandwidth to allow ali 

the links to run simultaneously, but carefully apportions this, bandwidth so that circuits connecting to the ring 
are never required to transfer data in high-speed bursts. Instead, a fixed time slot cycle is employed that 
assigns slots to each circuit at welhspaced intervals. The use of this feed cycle also means that source and 
destination addresses need not be carried oh the ring Itself since they can be readily determined at any 
20 point by a properly synchronised counter. 



4.4.1 Electrical Description 

The ring is 32 data bits wide and is clocked at 24 MHz, This bandwidth is sufficient to support data 
rates of up to 150 MWt/s. In atidrtion, to the data bits, the rings contains four parity bits, twoteg bits, a sync 
bit to Identify the start of a superframe. and a clock signal, Within the ring,: single-ended ECL circuitry is 
used for all signals except the clock, which is differential ECL The ring interface logic provides connecting 
circuits with TTL-compatible signal levels.. 



4,4.2 Time Slot, ^P^nclng Requirements 

In order to meet the above objectives, the, time sbt cycle is subject to a number of constraints: 
35 During each complete cycle there must be a unique time slot for each combination of source and 

destination, 

2. Each connecting circuit must see its data time slots appearing at reasonably regular intervals. 
Specifically, each circuit must have a certain minimum interval between its' data time slots. 

£ Each link handler must see Its data time slots in numericai order by memory module number. 
40 (This., is to avoid making the link handler shuffle a 51 2-bit word,) 

4 Each TSA must have a known interval during which it can perform a refresh cycle or other 
miscellaneous memory operation. 

5. Since the TSAs in the memory modules must examine every control time slot there must also be 
a mmlmurn Interval between control time slots. 



4.4,3 Time Slot Cycle 

bo Table I shows one data frame of a timing cycle which meets these requirements, One data frame 
consists of a total of 30 time slots, of which 64 are used for data and the remaining 16 for control. The table 
shosys, for each memory module TSA the sbt during which it receives data from each XLH to be written 
into memory and during which it must supply data that was read from memory for each ILH. Every fifth slot 
Is a control time slot during which the indicated link, handier broadcasts control orders to alt: the TSAs. For 

ss the purposes oi this table, XLHs and iLHs aro numbered 0-3, and TSAs are numbered 0™15. TSA 0, for 
example, during: time slot 0 receives data from XLH 0 and must supply data for IIH 0. During slot 17, TSA 0 
performs similar operations for XLH 2 and ILH 2. Slot 46 is used for XLH 1 and ILH 1, and slot 63 is used 
for XLH 3 and ILH 3. The re-use of the same time slot for reading and writing is permissible since XLHs 
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never read from memory and !LHs never write, thus effectively doubling ihe data bandwidth of the ring. 
The control time slots are assigned, in sequence, to the: four XLHs, the four iLHs, and (he central 

control {(30), With these nine entities sharing the control time slots, the control frame is 45 time slots long. 

The 80-slot data frame and the 45-slot control frame come Into alignment every 720 time slots, This period 
5 is the superframe and is marked by the superframe sync signal 

There is a subtle synchronization condition that must also be met for the iLHs. The words of a block 

must be sent in sequence beginning with word 0, regardless of where in the ring, timing" cycle the order was 

received. To; assist in meeting this requirement, the ring interface circuitry provides a special "word 0 ri sync 

signal for each ILH. For example, .in the timing cycle of Tabie i a new address might be sent by ILH 0 
10 during time slot 24 (its control lime slot), It is necessary to ensure that ISA number 0 i$ the first TSA to act 

on this new address {requirement 3 in section 4,4,2) even though the data time slots for reads from TSAs 

numbered 5 through 15 for iLH 0 immediately follow time slot 24, 

Since the number of time slots in the superframe 7£Q. exceeds the number of elements on the ring, 25, 

it is apparent that the logical time stots do not have a permanent existence; each itme slot is, in effect, 
75 created at a particular physical location on the ring and propagates around the ring until it returns to this 

location , where it vanishes. The effective creation point is different for data time slots than, for control time 

slots. 
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TABLE I 
RING TIME SLOT ASSIGNMENT 



5 


,„ Time 1 Slot 


Write to 
TSA 


From 
XLH 


Read from 
TSA 


To 
3LH 


Control 
Slot Soun 




00 


0 


0 


0 


0 






01 


7 


1 


7 


1 






02 


13 


2 


13 


2 






03 


4 


3 


4 


3 






04 
05 


I 


0 


1 


0 


XLH0 




06 


8 


1 


8 


1 






07 


14 


2 


14 


2 




20 


08 
09 


5 


3 


5 


3 


XLH1 




10 


2 


0 


2 


o 




; 


11 


9 


1 


9 


1 






12 


IS 


2 


15 


2 






13 


6 


3 


$ 


3 




30 


14 
15 


3 


o 


3 


o 


XLH2 




16 


10 


1 


10 


1 






17 


0 


2 


o 


2 




25 


J. o 


7 




7 


















XT i-n 




20 


4 


0 


4 


0 




40 


21 


11 


1 


11 


1 






22 


1 


2 


1 


2 






23 


8 


3 


8 


3 




46- 


24 
25 


5 


0 


5 


0 


ILH0 




26 


12 


1 


12 


1 






27 


2 


2 


2 


2 




50 


28 


9 


3 


9 


3 
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JS 



25 



30 



35 



SO 



SS 



29 BLH1 

30 6 0 6 0 

31 13 I 13 I 

32 3 2. 3 2 

33 10 " 3 10 3 

34 ILH2 

35 7 0 7 0 

36 14 1 14 1 

37 4 2 4 2 

38 11 3 11 3 

39 &-H3 

40 8 0 8 0 

41 15 1 15 1 

42 5 2 5 2 

43 12 3 12 3 

44 CC 

45 9 0 9 0 

46 0 1 0 1 

47 6 2 6 2 

48 13 3 13 3 

49 XLHO 
50 : 10 0 10 0 

51 11 11 

52 7 2 7 2 

53 14 3 14 3 

54 XLH1 

55 11 0 11, 0 

56 2 1 2 1 

57 8 2 8 2 

58 15 3 15 3 

59 XLH2 

60 12 0 12 0 

61 3 1 3 1 

62 9 2 9 2 
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63 


0 


3 


0 


3 




64 












65 


13 


0 


13 


0 










1 




66 


4 


1 


4 




67 


10 


2 


10 


2 




68 


1 


3 


I 


3 


TO 














70 


14 


0 


14 


0 




71 


5 


1 


5 


1 


15 


72 


11 


2 


11 


2 




73 


2 


3 


T 


3 




74 












75 


15 


0 


15 


0 


30 


76 


6 


1 


6 


1 




77 


12 


2 


12 


2 




78 


3 


3 


3 


3 


25: 


79 











XLH3 



ILHO 



ILHi 



ELH2 



30 4 - 4 : 3 '. 1 . Hfffi Time Slots 

Data time slots can be considered to originate at the owning XLH, A data time slot is used to carry 
incoming data to Its assigned memory module, at which point it is re-used to carry outgoing data to the 
corresponding iLH, Since XLHs never receive information from a date time stot* the ring can be considered 
s$ to be topically broken (for data time $!ots only) between the ILHs- and the XLHs. 

The two tag bits identify the contents Of the data time slots as fcltows; 
11 Empty 
10 Data 

01 First word of packet 
40 DO Last word of packet 

The "Hrst word of packet" is sent only by memory module 0 when it sends the first word of a packet to an 
ILK The "last word of packet" indication is sent only by memory module 15 when it sends the end ot a 
packet to an ILK 



45 



4.4,3,2 Control Time Slots 



Control time Slots originate and terminate at the station of central control 20 on the ring. The link 
handlers use their assigned control slots only to broadcast orders to the TSAs. The CO is assigned every 
so ninth control time slot. The TSA,$ receive orders from ail control time' slots and send responses back to the 
CC on the CC control time. slot. 

The two tag bits identify the contents of a control time slot as follows: 
11 Empty 

10 Data (to or from CC) 
$$ 01 Order 

.00 Address & count (from a link handler) 
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4,5 External Unk Handler 

The principal: function of the XLH is to terminate the incoming high-speed data channel from a NIM, 
deposit the data in the MlNTs buffer memory, and pass the necessary Information to the MINT'S central 
control 20 so that the data Can be forwarded to its destination, \n. addition, the XLH terminates an incoming 
iaw-speed control channel that is multiplexed on the fiber link. Some of the functions assigned to tlte -low- 
speed controi channel are the transmission of the MM status arid control of flow in the network. It should be 
noted that the XLH is only terminating the incoming fiber from the NM, Transmission to the NIM Is handled 
by the internal iink handler and the phase alignment and scrambler circuit that will be described later. The 
XLH u$es an onboard processor 26S to interface to the hardware of the MINT central control 20. The four 
20 Mbit/sec links coming from this processor provide the connectivity to the central control section of the 
MINT. FlG/ : i2 shows an overall view of -ihe XLH. 



T5 4.5.1, Link Interface 

The XLH contains the fiber optic receiver, dock recovery circuit and descrambler circuit needed to 
recover data from the fiber. After the data clock is recovered (block 250) and foe data descrambled (block 
252) the data is then converted from serial to parallel and demultiplexed (block 354) into the high-speed 
so data channel and tie low-speed data channel. Low fevei protocol processing is then performed on the data 
on the high-speed data channel {block 255) as described in §5. This results in a data stream consisting of 
oniy packet data. The stream of packet data then goes through a firet-in-fr'st-out (FIFO) queus 258 to a data 
steering circuit 260 which steers the header into the header FIFO 266 and sends the complete packet to the 
XLH's ring interface 362, 



4 r £2 Ring interface 

The ring interface 262 logic controls transfer of data from the packet FIFO 258 in the link interface to 
30 the MINTs buffer memory, it provides the .following functions: 

I* Establishing and maintaining synchronization with the ring's timing cycle, 

2, Transfer of data from the link interface FIFO to the proper ring time sitfts, 

3. Sending a new address to the memory TSAs when the end of a packet is encountered. 

It should be noted that resynchronteatlon with- the ring's 16-word {per XLH) timing .cycle will havs to be 
35 performed during the processing ot a packet whenever the link interface FIFO becomes temporarily empty, 
This : will fce-a normal occurrence since the ring's bandwidth rs higher than the link's transmission rate, The 
ring and TSA, however, are designed to accommodate gaps in the data stream. Thus, ^synchronization 
consists simpiy of waiting for .data to become available and for the ring cycle to return to the proper word 
number, marking; the intervening time slots "empty/' For example, If the FIFO 258 becomes empty when a 
40 word destined for the fifth memory module is needed, it is necessary to ensure that She next word actually 
sent goes to that memory module, in order to preserve the overall sequence. 



46 



4 ; 5.3 Control 

The controi portion of the XLH is responsible for replenishing the free block: FIFO 270 and passing the 
header information about each packet received to the MJNTs central control 20 (FIG. 4). 



50 4,5.3 J Header Processing 

At the same time a packet is being transmitted on the ring, the header of the packet is deposited In the 
header RFQ £.6$ that i$ subsequently read by the XLH processor 268. In this header are the source and 
destination address fields, which the centra] control will, require for routing. In addition* the header checksum 
55 is verified to ensure that these- fields .have not been corrupted. The header inform.atfen is then packaged 
with a memory block descriptor (address and length) and sent in a message to the central control 20 (FIG. 
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4.5,3,2 interaction with Central Controi 

There are only two basic interactions with the MJNTs central control The XLH control attempts to keep 
free-block FIFO 270 full with block addresses obtained from the memory manager, and is passes header 
s information and memory block descriptors to the central contra*- so that the block can be routed to its 
destination. The block addresses m subsequently placed on the ring 19 by ring interface 2£2 upon receipt 
of the address from control sequencer 272, Both Interactions with the central control are carried out over 
links, from XLH processor 268 to the appropriate sections of the central control, 

10 

4 - e Vernal Link Handier 

The internal link handler (ILH) {FIG. 13) is the first part of what can be considered a distributed iink 
controller, At any instant in time this distributed link controller consists 01 a particular ILH ; a path through 

75 the switch fabric and a particular Phase Alignment and Scrambler circuit 290 (PASC). Tne PASC is 
described In section 6.1, ft is the PASC that is actually responsible for the transmission of optical signals 
over the return fiber of fiber pair 3 to the N1M from the MINT. The Information that is transmitted over the 
fiber comes from the MANS 10. which receives inputs at different times from the ILHs sending: to that NllvL 
This kind of distributed link controller is necessary since path lengths through the MAN switch fabric are not 

so all equal. If the PASC did not align all of the information coming from different ILHs to the same reference 
dock, information received by the N!M would be continually changing its phase and bit alignment, 

Th® combination of the ILH with the PASC is m many ways a mirror image of the XLH. The ILH 
receives .lists of block descriptors from the central control, reads these blocks from memory, and transmits 
the data over the serial link to the switch, As data is received from" memory, the asociated block; descriptor 

35 is sent to the central control's memory manager so that the. block can be returned to the free list 

Th0 ILH differs from the XLH in .that the ILH performs no special header processing, and the TSAs 
provide the ILH with additional pipelining so that multiple blocks can be transmitted as a continuous stream 
if desired. 

30 

4,6/1 Ugk interface 

The link interface 289 provides the serials transmitter for the data channel. Daia is transmitted m a 
frame-synchronous format compatible with the link data format described - in §5. Since &e data is received 
35 from IN 'ring interface 2BQ {see below) asynchronously and at a rate somewhat higher that the link's 
average data rate, the link interface contains a FIFO 282 to provide speed matching and frame synchroniza- 
tion. The data is received from MINT memory via data ring interface 280, stored in RfrO 282, .is processed 
by level t and 2 protocol handler 286, and is transmitted to MAN switch 10 through the parallel to serial 
converter 288 within link interface 289. 



4,6 2- Ring interface 

The ring interface 260 logic controls the transfer of data from the MINT'S buffer memory to the FiFO in 
4$ die link interface. It- provide* the following functions; 

1. Establishing and maintaining synchronization with the ring's timing cycle. 

2. Transfer of data from the ring fd the link.interface FIFO during the proper ring time slots. 

3. - Notifying the control section when the Sa$t word of a packet (memory block) is received. 

4. Sending a new address and count (if available) to the memory TSAs 203,...,204 (FIG. TO) when the 
so last word of a packet is received and the condition of the FIFO 282 is such that the new packet will not 

cause an overflow. 

Unlike the XLH, the iLH reffes on the-TSAs to ensure that data words are received irt. sequence and with no 
gaps wfthfrt a block. Thus, maintaining word synchronisation in this case consists simply of looking to* 1 
unexpected empty data time slots. 



4,6.3 Control 
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The control portion of the IUi controlled by sequencer 283 is responsible for providing the ring 
interface with' block.: descriptors received via the processor link interface £84 from the cental control and 
stored therefrom ; in "address FIFO 285, notifying the central control, via the processor link interface when 
blocks have been retrieved from memory, and notifying the central control 20 when transmission of the final 
block is complete. 



4*6.3,1; Interaction with Central Control 



There are only three basic interactions with the MINTs central control: 
1. Receiving lists of block descriptors, 

Z r Informing the memory manager of blocks that have been retrieved from memory, 
3. informing the switch request queue manager when ail blocks have been transmitted. 

In the present design, all of these interactions are carried out over Transputer links to the appropriate 

sections of the central control. 



4J3.3.2. interaction wjth TSAs 

Like the XLH, the !LH uses its control time slots to send block descriptors {address end lengths) to the 
TSAs, When the TSAs receive a descriptor from an ILH, however, they will immediately begin reading the 
block from memory and placing the data on the ring. The length field from an ILH is. significant and 
determines the number of words that will be read by each TSA before moving on to the next block. The 
TSAs also provide each ILH with registers to hold the next address and length, so that successive blocks 
can be transmitted without gaps. Flow control is the responsibility of the ILH however, and a new descriptor 
should not be sent to the TSAs until there is enough room in the packet FIFO 262 to compensate for 
reframmg time and the difference in transmission rates, 



4.7 MINT Centra] Control 

FIG. 14 is a block diagram of MINT central control 20; This central control is connected to the four XLH 
16s of the MINT, the four ILH 17s of the MINT, to data concentrator 136 and distributor 138 of the switch 
control (See FIG. 7) ( and to an OA&M central control 352 shown in FIG* 15, The relationship of the central 
control 20 with other units will first be discussed. 

The MINT central control communicates with XLH 16 to provide memory block addressed for use by 
the XLH in order to store incoming data in the MINT memory. XLH 1 6 communicates with the MINT central 
control to provide the header of a .packet to be stored in MINT memory, and the address where that packet 
is to be stored. Memory manager 302 of MINT centrai control 20 communicates with ILH 17 to receive 
information that memory has ■ been released by an ILH because the message stored in those memory 
blocks has been delivered, so thatihe released memory can be reused. 

When queue, manager 311 recognizes that the first network unit arriving for a particular NlM has been 
queued in switch unit queue 314, which contains FIFO queues 310 for each possible destination NlM, 
queue manager 3 Jf - sends a request to switch setup control 313 to request a connection in MAN switch 10 
to that NlM. The request is stored in one of the queues 318 (priority) and 31 2 {regular) of switch setup 
control 313. Switch, setup control 313 administered these requests according to their priority and sends 
requests to MAN switch 10, specifically to switch control data concentrator 136, For normal loads, the 
queues 318 and 312 should be almost empty since requests can normally be made almost immediately 
and will generally, be processed by the appropriate MAN switch controller; For overload conditions, the 
queues 31 S and 312 become a means for deferring transmission of lower priority packets while retaining 
the relatively fast transmission of priority packets, if experience so dictates, it may be desirabte to move a 
request from the regular queue to the priority queue rf a priority packet for that destination NlM is received, 
Requests queued in queues 313 and 312 do not tie up an IL and ILH. and an output iink of circuit switch 
1 0: this is in contrast to requests in the queues 150.152 (FIG, 3) of an MAN switch controller 140 (FI& 7}, 

Y/hen switch setup control 313 recognizes that a connection has been established in switch 10, it 
notifies NlM queue manager 311, The ILH 17 receives data from a FIFO queue 316 in switch unit queue 
314 from NlM queue manager 311 to identify a queue of the memory locations of data packets which may 
be transmitted to the circuit switch, and for each packet, a list of one or more ports on the Ni.M to which 
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that packet is to be transmitted- MM queue manager 31 1 then causes ILH 17 to prefix toe port number(s) to 
each packet : and to transmit data for each packet from memory 18 to switch 10. ThaltH then proceeds to 
transmit the packets oi the queue and when it has completed this task, notifies the switch setup control 313 
that the connection in the circuit switch may be disconnected and notifies memory manager 302 of the 

5 identity erf the blocks of memory that can how be released, because the data has been transmitted. 

The MINT centra], control uses a plurality of high speed processors each of which have one or more 
input/output ports. The specie processor used in this implementation is the Transputer manufactured by 
iNMOS Corporation. This processor has four input/outout ports. Such a processor can meet the processing 
demands of the MINT central control 

to Packets come into the four XLHs 115. There are four XLH managers 305, source checkers 307, routers 
3Q9 r and OA&M MINT processors. 31.5. one corresponding to each XLH within the M1P4T; these processors, 
operating in parallel" to process the data entering each XLH increase the total data processing capacity of 
the MINT central control 

The header for each packet entering an XLH is transmitted along with the address where that packet is 

16 being stored directly to an associated XLH manager 30S, if the header has passed the hardware check of 
the cyclic redundancy code (CRC) of the header performed by the XLH. If that CRC check fails, the packet 
is discarded by the XLH which recycles the allocated memory block. The XLH manager passes the header 
and the identity of allocated memory for the packet to the source checker 307. The XLH manager recycles 
memory, blocks if any of the source checker, router, or NIM queue manager find it impossible to transmit 

20 the packet to a destination. Recycled memory blocks get used before memory blocks allocated by the 
memory manager. Source checker 307 checks whether the source of the packet Is properly logged in and 
whether that source has access to the virtual network of the packet Source checker 307 passes information 
about the packet, including the packet address In MINT memory, to router 309 which translates the packet 
group identification* effectively a virtual network name* and the destination name of the packet in, order to 

25 find out which output link this packet should ba sent on. Router 309 passes the identification of the output 
jink to NIM queue manager 311 which identifies and chains packets received by the. "four XLHs of this MiNT 
which am headed for a common output link. .After the first packet to a. NIM queue ha$ been received, the 
NIM queue manager 311 sends a switch setup request to switch setup control 313 to request a connection 
to that NIM. NIM queue manager 311 chains these packets in FIFO queues 316 of switch unit queue 31,4 so 

$0 that, when a switch connection is made in the circuit switch 10, all of these packets may be sent over that 
connection at one time. Output control signal' distributor 138 ot the switch control 22 replies with an 
acknowJedgmenifwhen it has set up a connection. This acknowledgment is received by switch setup control 
313 whfch informs NfM queue manager 31 1 NiM queue manager 311 then informs ILH 17 of the list of 
chained packets in order that ILH 17 may transmit ail of these packets. When ILH 17 has completed the 

35 transmission of this set of chained packets over the circuit switch, it informs switch setup control 313 to 
request a disconnect of the connection in switch 10, and informs memory manager 301 that the memory 
which was used for storing the data of the message Is now available for use for a new message. Memory 
manager 301 sends this : release information to memory distributor 303 which distributes memory to the 
various XLH managers 305 for allocating memory to the XLHs, 

40 Source checker 307 aiso passes billing informaiion io operation^ administration and maintenance 
(OA&M) MINT processor 315 in order to perform billing for that packet and to accumulate appropriate 
statistics for' checking on the data flow within the MiNT and, after combination with other statistics, in the 
MAN network. Router 303 also informs (OA&M) MINT processor 315 of the destination pf the. packet so that 
the OA&M MINT processor can keep track of data concerning packet destinations for subsequent traffic 

45 analysis. The output of the four OA&M MINT processors 316 are sent to MINT OA&M monitor 317 which 
summarizes the data collected by the four OA&M MINT processors for subsequent transmission to OA&M 
central control 352 (R(3, 14). 

MINT OA&M monitor 317 also receives Information from OA&M central control 352 for making changes 
via OA&M MINT processor 3'15 in the router 309 data; these changes reflect additional terminals added to 

so- the network, the movement of logical terminals (i.e., terminals associated with a particular user) from one 
physical port to another, or the removal of physical terminals from the network. Data is also provided from 
the QA&M central control 352 via the MiNT operation, OA&M monitor and the OA&M MINT processor 315 
ko source checker 307 for such data as a logical user's password and physical port as well as data 
concerning (he privileges of each logical user. 

4,8 MiNT Operation, Administration, and Maintenance Control Syst em 
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F3G. 15 is a block diagram of the maintenance and control system of the MAN network. Operation, 
administration, and maintenance (OA&M) system 350 Is connected to a plurality of OA&M central controls 
352, These OA&M controls are each connected to a plurality of MINTs; and within each MINT, to the MINT 
OA&M monitor 317 of MINT central control 20, Since many of She messages from OA&M system 350 must 
s be distributed to at! the MINTs, the various OA&M central controls" are interconnected by a data ring, This 
data ring transmits such data as the identification of the network Interface module, hence the Identification 
of the output link, of each physical port that is added to the network so that this information may bo stored 
in the router processors 309 of every MINT in the MAN hub. 

ro 

5 LINKS 



5,1 Link Requirements 

. The links in the MAN system are used ta transmit packets 1 between the BUS and the NIM (EUSL} (links 
14) and between the NIM and the MAN hub (XL) (links 3). Although the operation and the characteristics of 
the the data tftat 1$ transferred on these links varies slightly with particular application., the format used 
on the links is the same. Having the formats be the same makes it possible use common hardware and 
.20 software. 

The link format is designed to provide the following features, 
1,, It provides a high data rate packet channel. 

2. It is compatible with the proposed Metrobus "OS-t" format, 

3. Interfacing is easier because of the word onanted synchronous format 
2$ 4. It defies, how "packets" are delimited, 

5. It includes a CRC for an entire "packet" {and another for the header,) 
& The format insures transparency of the data within, a "packet", 
7. The format provides a low bandwidtfr channel for flow control signaling. 
B. Additional Jew bandwidth channels can be added easily. 
3d 9. Data scrambling insures good transition density for dock recovery. 



5,2 MAN Link Description and Reasoning 

as 

From a performance point of view, the faster the links are the better MAN m\\ perform. This desire to 
operate the Hnks as fast as possible Is tempered by the fact that faster links cost more. A reasonable 
tradeoff between speed and cost is to use LED transmitters (like the AT&T ODL-2G0) and multimode ftber* 
The use of ODL-200 transmitters and receivers p^ts an upper limit on the link speed of about. £QQMbit/sec, 

40 From the MAN architecture point of view, the exact data rate of the links is not important since' MAN does 
not do synchronous switching. Th& data rate for the MAN links was chosen to be the same as the data rate ■ 
of the Metrobus Lightwave System "OS-1 The Melrobus format is described in M. S, Schaefer: 
"Synchronous Optical Transmission Network for the Metrobus Lightwave Network",-. IEEE International 
Communications Conference, June 1987, Paper 308.1.1, Another data rate {and format) that could be used 

4S in MAN will come from the specification of SONET, a link layer protocol specified by Sell Communications 
Research Corp, for 150 Mbit/sec unchannelized links. 



5,2.1 Level 1 link Format 

50 

The MAN network uses the low level link format of Metrobu$, Information on the link is carried by a 
sirnpio frame that i$ continuously repeated. The frame consists of 88 * 16 : blt : words. The first word contains 
a framing seQuence and 4 parity bits. In addition to this first word, three other words are overhead words. 
These overhead words, which are used for mtemode communications in the Metrobus implementation, are 
55 not used by ft/IAN for the sake of Metrobus compatibility, The word oriented nature of the protocol makes 
using it much simpler, A simple 1(3 bit shift register with parallel load can be usod to transmit and a similar 
shift register with paraile! read out can be- used to receive- At the 1 46,432Mbitfsec, link data rate, a 16 bit 
word is transmitted or received every 109ns, This approach makes it. possible to implement much of the \mk 
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formatting hardware at conventional TTL clock rates. The word oriented nature of the protocol does put 
soma restrictions on the way the link is used, however. To keep the complexity of ite hardware reasonable 
it is necessary to use the bandwidth of (ha link M. units of 18 bit words. 

s 

5j,£ Level 2 Link Format 

The link Is used to move "packets 11 , the basic unit of information transfer, in MAN, To identify packets, 
the format includes the specification, .of "SYNC" words and an "IDLE" word. When no packets are being 

10 transmitted the *lDLE M word will fill all of the words that make up the primary channel bandwidth (words not 
reserved for other purposes). Packets are delimited by a leading START_3YNC and a trailing END__3YNC 
word, This scheme works weH as Jong as the words with special meanings are never contained in the data 
within a packet. Since restricting the data that can be sent in a packet is an unreasonable restriction, a 
■transparent data transfer technique must be used. MAN links employ a very simple word stuffing 

/s transparency technique. Within the packet data, any occurrence of a special meaning word, like the 
START_SYNC word, is preceded by another special word the "DIE* word. This word stuffing transparency 
was chosen because of the simplicity of Implementation, This protocol requires Ampler, lower speed logic 
than is required for bit stuffing protocol? like HDLC. The technique itself is similar to the time proven 
techniques used in IBM's BtSYNC finks. In addition to the word stuffing used \o ensure transparency, "FILL" 

m words are inserted if the data rate of the source fs slightly le$s than the link data rate, 

The last word in any packet is a cyclic redundancy check (CRC) word. This word is used .to insure the 
that any corruption .of the data in a packet can be detected. The CRC word is computed on ail of the data in 
the packet, excluding any special words like "DUE" that may need to be inserted in the data stream for 
transparency or oth&r reasons. The polynomial that is used to compute the CRC word is the CROi6 

2$ standard. 

To ensure good transition density for the optical receivers ali of the data is scrambled {e.g., bfock 296,. 
FIG, 13) prior to transmission. The scrambling makes it less likely that long sequences of ones or zeros wilt 
be transmitted on the link even though they may be quite common in the data actually being transmitted. 
The scrambler, and descrambler ^e-g, t block 252, FiQ, 12) are well known in the art. The descrambler design 
30 is seif synchronizing, which makes it possible to recover from occasional bit errors without having to restart 
the descrambler; 



5.2,3. Low Speed Channels and Row Control 
35 *™ 

Not ali of the pay toad words in the level 1 format are used for the lave) 2 format, that carries packets. 
Additional channels are included on the \mk by dedicating particular words within tie frame. These iow rate 
channels 25$>2$$ {FiG& 12 and 1.3) are used for MAN network control purposes, A packet delimiting 
scheme simitar to that used on the primary data channel is used on these low rate channels. The dedicated 

40 words that make up low rate channels can be further divided down into individual bits for very low 
bandwidth channels like the flow control channel. The flow control channel is used on the MAN EUSt 
(between the BUS and the NIM) to provide hardware level flow confoL The flow control channel (bit) .from 
tbe : iWio the BJS, indicates, to the EUS ltnk transmitter whether or not it is allowed to transmit more 
information. The design of the NIM is such that sufficient storage is available to absorb any data that is 

46 transmitted prior to th# EUS transmitter actually stopping after flow control is asserted. Data transmission 
can be stopped either between packets or in the middle of a packet transmission, if it is between packets, 
the next packet; will not be sent until flow control Is turned deasserted. If flow control is asserted in the 
middle of a packet, it is necessary to suspend data transmission immediately and start sending the "Special. 
FILL* code word. This code word, like ali others, is escaped with the "OLE 11 code word when it appears in 

$o the body of a packet. 



6 SYSTEM CLOCKING 

56 xtte man switch* as described in section 3, is an asynchronous space switch fabric with a. very fast 
setup controller. The data fabric of the switch is design to reliably propagate digital signals with data rates 
from DC to in excess of aOOMbits/secomi §?n.ce many paths can simultaneously exist through the fabric, 
the aggregate bandwidth requirements of the MAN hub can be easily meet by the fabric. This simple data 
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fabric is not without drawbacks however. Because of mechanical and electrical constraints in implementing 
toe fabric, it is not possible for ail paths through the switch to incur the same amount of delay. Because the 
variations in path deiay between different paths may be much greater than the bit time of the data going 
through the switch* it is not possible to do synchronous switching. Any time that a path is setup from a 
5 particular ILH in a MINT to an output port of the switch, thorn is no guarantee that data transmitted over that 
path will hava the same relative phase as the data transmitted over a previous path through the swiich« To 
use this high bandwidth switch it is- therefore necessary to very quickly synchronize data coming out of a 
switch port to the clock being used for the synchronous link to the MM. 

70 

® A ^ e Phase Alignment and Scrambler Circuit (PASO) ■ 

The unit that must do the synchronic alien of data coming from the swiich and drive the outgoing link to 
the NIM called the Phase Alignment and Scrambler Circuit (PASC) {block £90, FIQ, 13). Since the ILBs and 

75 the PASC circuits are all part of the MAN hub, it is possible to distribute the sam£ master ciock to all of 
them. This has several advantages. By using the same clock reference in the PASC as is used to transmit 
data from the ILH, one can be sure that data can not be coming into the PASC any faster than it is Wmg 
moved out of it over the link. This eliminates the need for large FIFOs and elaborate elastic store controllers 
in the PASC, The fact thai the bit rate of all data that comes into a PASC is exactly the the- same makes, the 

20 synchronization easier. 

The ILH and the PASC can be thought of as a distributed link handler for the format described in the 
previous section. The ILH creates the basic framing pattern into which the data Is inserted and transmits it 
through the fabric to a RASC. The PASC aligns this framing pattern with ft? own framing pattern, merges in 
the tow speed control channel and then scrambles the data for transmission. 

25 The PASC synchronizes the incoming data to the reference ciock by inserting an appropriate amount of 
delay into the data path. For this to work the ILH must be transmitting each frame with a reference ciock 
that is slightly advanced from the reference clock used by the PASC. The number of bit times of advance 
that the ILH requires is determined by the actual minimum delay that may be incurred in getting from the 
ILH to the PASC. The amount of delay that the PASC must be capable of inserting into the data path is 

,30 ■ dependent. on, the possible variation in path delays, that may occur for different paths through the switch. 

FiG, S3 is a block diagram of an ilfastrative embodiment of the invention. Unaligned data enters a 
tapped delay line 1001. The various taps of the delay line are clocked into edge sampling latches 
1O03-»,1GOS by a signal that is 180 degrees put of phase with the reference clock (RSFCLK) and is 
designated REFCLK . The outputs of the edge sampling latches: feed selection logic unit 1007 whose output 

3$ is used to control a selector 1013 described below, Selection logic 1007 Includes a set. of Internal latches 
for repeating the state of latches 1003,^1005. The selection logic includes a priority circuit connected to 
these internal latches, for selecting the highest rank order input which carries a logical "one",, The output is 
a coded identification of this selected input. The selection logic t007 has two gating signals: a clear signal 
and a signal from all. of a group of internal latches of the selection logic. Between data streams, the clear 

to signal goes to a zero state causing the internal latches to accept new inputs. After the first "one" input has 
been received from the edge sampling latches 1003*-.. ,1005 in response to the first pulse of a date, stream, 
the state of the transparent latches is maintained until the clear signal goes back to the aero state. The dear 
signal is set by out of band circuitry which recognfees the presence of .a data stream. 

The output of the tapped delay line also goes to a series of data latches 1009 t „.J01 % The input to the 

•*s data latches is clocked by the reference .ciock. The outputs of the daia batches 1009,. ..,1 011 are the inputs 
\o selector ..circuit 1013 which selects the output of one of these data iatchgs based on the mput ifrom 
setectlon bgic 1007 and connects this output to trie output of the selector 1013, which is the bit aligned 
data stream as labeled on Fi<3* 23. 

After the bits have been alSgfiedi they are fed into a shift register (not shown) with tapped outputs to 

&? feed the driver XLS. This is to allow data streams to- be transmitted synchronously starting at sixteen bit 
boundaries. The operation of the shift register and auxiliary circuitry is substantially the same as that of the 
tapped delay line arrangement. 

The selection logic is. implemented in commercially avaiiabfe priority selection circuits. The selector is 
simply a one ou* of eight selector controlled by the output of the selection iogic. if it is necessary to have a 

ss finer alignment circuit using a one of sixteen selection, this can be readiiy implemented using the same 
principfes. The arrangement described herein appears io be especially attractive in situations where there Is 
a common source clock and where the length of each data stream is limited. The common source ciook is 
required since the clock is not derived from the incoming signal, but is> in fact, used to gate an incoming 
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Signal appropriately. The limitation on the length of the block is required since a particular gating selection 
is maintained 'for the entire block so that If ths block length were too long,, any substantial amount of phase 
wandering would fcause synchronism to be lost and bits to be dropped. 

While in the present Embodiment, the signal is passed through a tapped delay line and is. sampled by 
s the clock and Inverse clock, the alternative arrangement of passing the clock through a tapped delay line 
and using Ihe delayed clocks to sample the signal could also be used in some applications. 



&2 Clock Distrj Ibuitofl 

10 

The MAN hub operation Is very dependent on the use of a single master reference clock for all of the 
ILH and PASO units in the system, The master clock must: be distributed accurately and reliably to ail of the 
units, in addition to the basic clock frequency that must be distributed, the frame start pulse must be 
distributed to the PASC and an advanced frame start pulse must be distributed to the iLH, Al! of these 

is functions are handled by using a single clock distribution link (fiber or twisted pair) going to each unit 

The information that is carried on ftiose clock distribution Sinks come? from a single dock source, This 
information can be spilt in. trie electrical and/or dptfcaf domain and transmitted to as many destinations as 
necessary, There is no attempt to keep the information on all of the clock distribution links exactly in phase 
since the ILH and PASO are capable of correcting for phase differences no matter what the reason for this 

so difference. The information that is transmitted is simply alternating ones and zeros with two sxceptfons. The 
occurrence of two ones in a row indicates an advanced frame pulse and the: occurrence of two zeroes in a 
row Indicates a normal, frame pulse. Each board that terminates one of these clock distribution links 
contains a. dock recovery module. The clock recovery module is the same as that used for the Bnks 
themselves, The dock: recovery module will provide a very stable bit clock while additional logic extracts 

as the appropriate frame or advanced, frame from the data itself. Since the clock mo^ty modules will 
continue to oscillate at the correct frequency even without bit transitions for several bit times, even the 
unlikely occurrence o.f a bit error wrH not affect, the clock frequency, the logic that looks for the frame or 
advanced frame signal can also be made tolerant of errors since it Is known that the frame pulses are 
periodic and extraneous pulses caused by bit errors can be ignored* 

7 NETWORK INTERFACE MODULE 



os 7.1 Overview 

The network interface module (NIM) connects one or more end user system links (£USL) to one MAN 
external imk (XL), In so doing, the NiM performs concentration and demultiplexing of network transaction 
units (Le. packets and SUWUs), as well as insuring source identification integrity by affixing a physical 
40 "source port number" to each outgoing packet. The latter function, in- combination with the network 
' registration service described in §2.4> prevents a user from masquerading as another for the purpose of 
gaining access to unauthorised network-provided services. The NIM thereby represents the..boundary of the 
MAN network proper; NIMs are owned by the network provider, While UIMs '{described in §8} are owned by 
the users themselves. 

45 This section describes the basic functions of the NIM in more detail* and presents the NIM architecture. 



7.2 Basic Functions 

so The NM must perform the following basic functions: 

EUS Unk interfacing, One : or more interfaces must be provided to EUS link(s) (see § 2.2.5). Tbe 
do^nsHim 1 nk "{[£." "from NIM to \)\U) consists of a data channel and an out-of-band channel used by the 
NIM to flow control the upstream link when NtM input buffers become full. Because the downstream link is 
not fiow controlled, the flow control .channel on the upstream link is unused. The Data and Header Check 

5& Sequences (DCS, HG$) are generated by the U1M on fie upstream link, and checked by the UiM on the 
downstream link, 

Bxternai Link interfacing , The XL {§ 22&) is y.ery similar to the EUSL, but lacks DCS checking and 
generation on both ends. This is to allow erroneous, but still potentially, useful data io be delivered to the 
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DIM. The destination port numbers in network transaction units arriving on the downstream XL are checked 
by the NIM, with illegal: values resulting in dropped data, 

Concentration and demultiplexing, Network transaction units arriving on the £U$U contend for and are 
stefrsttcaiiy multiplexed to the outgoing XL Those arriving on the XL are routed to the appropriate EUSL by 
$ mapping the destination port number to one or more EUS links. 

Source port iderttijicggon. The port number of *he source UIM is prspended to each network transaction unit 
^Ij^^^^^^^-^^^Pf,^,, g en erabr 403 (FIG. 1 6), TOs port number wilJ be checked against the UAH 
address by the MINT to prevent unauthorized access to services (including the most basic data transport 
service) by 1mposters\ 

w 

7 * 3 Architecture and Operation 

The architecture of the NtM is depicted In FIG. 18. The following subsections briefly describe the 
is operation of the 



7.3.1 Upstream Operation 

20 incoming network transaction units are received from the UIMs at their EUSL interface 400 receivers 
402, are converted to words in serial to parallel converters 4G4 and are accumulated in FIFO buffers 94. 
Each EUSL. interface is connected to the NIM transmit bus 95, which consists of a parallel data path, and 
various .signals for bus arbitration and clocking. When a network transaction unit has been buffered, the 
EUSL interface 4D0 arbitrates for access to the transmit bus 95. Arbitration, proceeds in parallel with data 

as transmission on the bus. When the current data transmission is complete, the bus arbiter awards bus 
ownership to one of the competing EUSL interfaces, which begins transmission. For each, transaction, the 
EUSL port number, inserted at the beginning of each packet by .port number generator 403, is ttansmitted 
first, followed by the network transaction unit, Within. an XL' Interface 440, the XL transmitter S6 provides the 
bus clock, and performs parallel to serial conversion 442 and data transmission on the upstream XL 3. 

so 

7.3.2 Downstream Operation 

Network transaction units arriving from the MINT on the downstream XL 3 are received within XL 
35 interface 440 by the XL receiver 446, which is connected via serial to parallel converter 448 to the NIM 
receive bus 430, The receive bus is similar to, but independent of the transmit bus. Also connected to the 
receive bus via a parallel to serial converter 408 are the EUSL Interface transmitters 410, The XL receiver 
performs serial to parallel conversion, provides the receive bus clock, and sources the incoming data onto 
the bus. Each EUSL interface decodes the EUSL port number associated with the data, and forwards the 
40 data to its EUSL if appropriate, More than one EU§L interface may forward the data if required; as in a 
broadcast or multicast operation. Each decoder 409 checks the receive bus. 430 while port number(s) are 
bein$ transmitted to see if the following packet is destined for the end user of this EUSL interface 400; if so, 
the packet is forwarded to transmitter 410 for delivery to an EUSL 14, illegal EUSL port numbers (e,g. 
violations of the error coding scheme) result in the data being dropped (i.e, not forwarded by any EUSL 
•is interface). Decode block 409 is used to gate information destined for a particular EUS link from transmit bus 
95 tb the paraiiet/serial converter 409 aid transmitter 410. 



8 INTERFACING TO MAN 
BA Overview 

A user interface module (U!M) consists Of the .hardware .and software necessary to connect one or more 
55 end user systems (EUS), local .area networks {LAN), or dedicated point-to-point links to a single MAN end 
user system link (EUSL) U. Throughout this section, the term EUS will be used \o genericaliy refer to any 
of these network end user systems, Ciearfy, a portion of the UIM used to connect a particular type of EUS 
to MAN is dependent on the architecture of that EUS, as well as the desired performance, flexibility, and 
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cost of the rmpiementation, Some of the functions provided by a UiM, however, must be provided by every 
UIM in the system, ft Is therefore convenient to view the architecture of a UIM as having two distinct halve?: 
the network interface, which provides the EUS*independe.nt funotlonality, and the EUS interface* which 
implements the remainder of the UIM functions for the particular type of EUS being connected. 

s Not all EUSs will require the performance inherent in a dedicated external llnfc The concentration 
provided by a NIM {described in §7) is an appropriate way to provide access to a number of EUSs which 
have stringent response time requirements along with the instantaneous t/O bandwidth necessary to 
effectively utilize the Ml MAN data rate, but which do not generate the volume of traffic necessary to 
efficiently bad the XL. Similarly, several EUSs or LANs couid be connected to the same UIM via some 

10 intermediate iink {or the LANs themselves), In this scenario, the UIM acts as a multiplexer by providing 
several EUS (actually LAN or link} interfaces to go with one network interface, This method Is weH suited to 
£USs which do not allow direct connections to their system busses, and which provide oniy a Sink 
connection that is itself Smited in bandwidth. End users can. provide their multiplexing or concentration at a 
UIM and MAN can provide further multiplexing or concentration at the NIM, 

re This section examines the architectures of both the network interface and EUS interface halves of the 
UIM, Tne functions provided by the network interface are described and Ihe architecture is presented. The 
heterogeneity of EUSs that may be connected to MAN does not ailow such a generic treatment of the EUS 
interfaces, instead, Ihe EUS Interface design options are explored, and a specific exampfo of an. EUS is 
used to illustrate one possible EUS interface design, 

20 

8.£ UIM - Network fnterfecs 

The UiM network, interface implements the EUS-independen£ functions of the UiM, Each network 
z$ Interface connects on© or more EUS interfaces to a single MAN EUSL 



&2»1 Basic Functions 

30 The UIM network interface must perform the following functions: 

EUS link interfacing. The interface to the EUS Link Includes an optical transmitter and receiver, along with 
: aJThardSrare necessary to perform the link level functions required by the EU$L (e.g. GRC generation and 
checking, data formatting, etc), 

Data buffering. Outgoing network transaction units (Le, packets and SUWUs) must be buffered so that they 
35 may be transmitted on the fast network link without gaps* Incoming network transaction units are buffered 

for purposes of speed matching, and level three (and above) protocol processing. 

Suffer m emory management. The packets of one LUWU may arrive at the receive UtM Interleaved with 

those of another LUWU. In order to support this concurrent reception of several LUWUs, the network 

interface must manage its receive buffer memory in a dynamic fashion, allowing incoming packets to be 
40 chained together into LUWUs as they arrive. 

Protocol processing. Outgoing LUWUs must be fragmented into packets, for' transmission into the network. 

Similarly, incoming packets must be recombined into LUWUs for defivery to the receiving process within 

the sua 

B22 Architectural Options 

Cieariy, all of the functions enumerated in the previous subsection must be performed in order to 
Interface any EUS to a MAN-EUSL However, some architectural decisions must be made regarding where 

so these functions are performed; i.e., whether they are internal or external to the host itsfcft 

The first two functions must be located external to the host, although fgr different reasons. The first and 
fewest level, (unctlpn, that of interfacing to the MAN EUS Link, must be implemented externally simply 
because it consists of special purpose hardware which is not part of a generic EUS, The EUS link interface 
simply appears as a bidirectional; I/O port to the remainder of the UIM network interface, On the other hand, 

55 the second function, data buffering, cannot be implemented in existing host memory because the bandwidth 
requirements are too stringent. On reception, the network interface must be abie to buffer incoming packets 
or SUWUs back-to-back at the full network data rate .(150 Mb/s). This data rate is such that it is generally 
impassible to deposit incoming packets directly into EUS memory. Similar bandwidth constraints apply to 
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packet and SUWU transmission as well, since they must be completely buffered and th$n transmitted at the 
full 150 Mb/s rate. These constraints make it desirable to provide the necessary buffer memory external to 
fhe EU& It" should be noted that while FIFO memory will suffice to provide the necessary speed matching 
for transmission, the Jack of flow control on reception along with the interleaving of received packets 

6 necessitate that a larger amount of random access memory be provided as receive buffer memory. For 
MAN, the size of receive buffer memory may range from 256 Kbytes to 1 Mbyte. The particular size 
depends on the Interrupt latency of the host and on the maximum size LUWU allowed by the host software. 

The final two functions involve processing, which could conceivably be performed by the host 
processor Itself. The third function, buffer memory management, invokes the timely allocation and 

ro deallocation of blocks of receive buffer memory* The latency requirement associated with the allocation 
operation fs stringent, due once more to the high data rates and the possibility of packets arriving hacMo- 
back. However, this can be alleviated (for reasonable burst sizes) by pre-ailocating several blocks of 
memory. It is possible, therefore, for the host processor to manage the receive packet buffers. Similarly, the 
host processor may or may not assume the burden of the fourth function, that of MAN protocol processing. 

rs The location of these final two functions determines the level at which the EUS connects to the UiM. if 
the host CPU assumes the burden for packet buffer memory management and MAN protocol processing 
(the "local* configuration), then the unit of data transferred across the EU.S interface i& a packet, and the 
host is responsible for fragmenting and recombinmg LUWUs. .If, on Ihe other hand, those functions are off- 
loaded to another processor In the UiM, the from end processor {F£P} configuration, the unit of data 

20 transferred across the : EUS interface fs a LUWU, While in, theory, subject to interleaving constraints at *he 
EUS interface, the unit of data transferred may be any amount less than or equal to the entire IUWU, and 
ihe units delivered by the transmitter need not be she same stee as those accepted by the receiver, for a 
general and uniform solution, useful for a variety of EUSs, the LUWU is to be preferred as the baste unit. 
The FEP configuration offloads the majority of the processing burden from the host CPU, as well as 

25 providing for a higher level EUS interface , thereby hiding the details of the network operation from the host. 
With the FEP, the host knows only about LUWUs, and can control their transmission and reception at a 
higher, less CPU intensive level. 

Although a bwer cost interface is passible utilizing the local configuration, the network interface 
architecture described in the following section is a FEP configuration more characteristic of that required by 

30 some of the high performance EUS that are natural users of a MAN network, An additional reason for 
choosing the FEP configuration initially is that it is better suited for interfacing MAN to a LAN such as 
ETHERNET, in which case there is. no "host CPU Tt to provide buffer memory management and protocol 
processing. 

8.2.3 Network Nerface Architecture 

The architecture of the UIM network interface fs depicted in FIG. 17. The following subsections briefly 
describe the operation of the UjM network interface by presenting scenarios for the transmission and 
40 reception of data. An FEP-type architecture is employed* ..Le.* receive buffer memory management and 
MAN network layer protocol processing are performed external to the host CPU of the EUS. 



gjffil Transmission of Data 

43 

The main responsibilities of the network interface on transmission are to fragment the arbitrary sizsd 
transmit user work units (UWUs) into packets (if necessary), encapsulate the user data in the MAN header 
and trailer, and transmit the data to the network. To begin transmission, a message irom the EUS 
requesting transmission of a LUWU traverses the EUS interface and Is handled by network interface 

$0 processing 450, which also implements memory management and protocol processing functions. For each 
packet. the protocol processor portion of the interface processing 450 -formulates a header and writes it into 
the transmit FIFO 1 5, Data for that packet is than transferred across the EUS interface 451 into the transmit 
FIFO 15 within link handler 460, When th$ packet is completely buffered, the link handler 460 transmits it 
onto the MAN EUS link using transmitter 454, followed by the trailer, which was computed by the link 

55 handler 460. The (ink is flow controlled by ihe NIM to ensure that the N|M packet buffers do not overflow, 
This transmission process is repeated for each packet. The transmit FIFO 15 contains space for two 
maximum length packets so that packet transmission may occur at the maximum rate. The user is notified 
via the EUS interface 451 when the transmission Is complete, 
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agjj.. Reception of Data 

Incoming data Is received by receiver 468' and loaded at She 150 Mm link rase into elastic buffer 4US- 
Dual-ported video RAM is utilized for the receive buffer memory 90, and the data is untoaded from the 
elastic buffer end loaded into the shift register 464 of receive buffer memory 90 via its serial access port. 
Each packet is then transferred from the shift register into the main memory array 466 of the receive buffer 
memory under thecontroi of the receiver DMA sequencer 453. The block addresses used to perform these 
transfers are provided by the network interface processing arrangement 450 of U1M 13 via the butter 
memory controller 456, which buffers a small number of addresses, in hardware to relieve the strict latency 
requirements which would otherwfee by imposed by backpack SUWUs. Block 450 is composed of 
blocks 530, 540, 542, 550, 552, 554, 556, 558, 560, and 5©2 of FIG, 19. Because the network interim 
processing has direct access to the buffer memory via its random access port, headers are not stopped off; 
rather they are placed into buffer memory along with the data. The receive queue manager 558 within 460 
handles the headers and. with input from the memory manager 550, keeps track of the various SUWUs and 
LUWUs as they arrive. The EUS is notified of the arrival of data by She network interface processing 
arrangement 450 via the EUS interface. The details of how data is delivered to the EUS are a function of the 
partJcufar EUS interface being employed, and are described, tor example, in section 6.3.3.2. 



a? 3.3 UtM - EUS Interfaces 



8.3.1 Philosophy 



ID 



30 



This section describes the -he** of the network interface that is GU3 dependent. The basic function of 
the EUS interface is the delivery of data between the EUS memory and the UIM network interface, in both 
directions Each particular EUS interface will define the protocol to effect delivery, the format of data and 
control messages, and the physical path for control and data. Each skis of the Interface has to implement a 
flow control mechanism to protect itself from being overrun. The EUS must be able to centre- Us own 
memory and the flow of data Into it from Hie network, and the network has to be able to protect rtseif as 
well Only at this basic functional level is it possible to talk about commonality in EUS interfaces. EUS 
interfaces will be different because of EUS hardware and system software differences. The needs of the 
applications using the network, coupled with the capabilities of the EUS, will also force interface design 
decisions dealing' with performance and flexibility. There will be numerous interface choices even for a 

as sins !^!^ e ° f ^, U ^ hofces mew$ mt ttw int9r(ac e hardware can range from simple designs with few 
components to complex designs including sophisticated buffering and memory management schemes 
Control functions in the interface can range from simple EUS interfaces to handling network level 3 
orotocols and even higher level protocols for distributed applications. Software in- the EUS can also range 

40 From straightforward data transmission schemes that fit underneath existing networking software, to more 
extensive new EUS software that would -allow very flexible uses of the network or allow the highest 
performance that the network has to offer. These interfaces must be tailored to the specie existing EUb 
hardware and software systems, but there must also be an analysis of the cost of interface features m 
comparison to the benefits they would deliver to the network applications running in these EUSs. 
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The tradeoff between a front end processor (PEP) and EUS processing is one example of different 
interface approaches to accomplish the same basic function. Consider variations in receive buffering. A 
specialized EUS architecture with a high performance system bus could receive network packet messages 
directly from the network- links, However, usually the interlace wiil at least buffer packet messages as they 
' come off the link, before they are delivered into EUS memory. Normally EUSs, either transmitting to or 
receivinq from the network, do not know (or want to know) anything about the internal packet message. In 
that case, the receiving interface might have to buffer multiple packets that come from the LUWU of data 
that is the natural sized transmission unit between the transmit and receive EUSs. Each one of these three 
receive buffering situations is possible and each would require a significantly different EUS interface to 
transfer data into the EUS memory, if the EUS has a particular need to process network packet messages 
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ancf has ^processing power and system bus performance to devote to that task then the EUS dependent 
portion of the network interface woufci be simple, However, often it will be desirable to off-load that 
processing into the EUS interface and improve the EUS performance. 

Different transmit buffering approaches also illustrate the tradeoff between F£P and EUS processing. 
For a specialised application, art EUS with high performance processor and bus could send network packet 
messaged directly into the network. But if the application used EUS transaction sizes that were much larger 
that She packet .message size, 11 might take tod much of the EUS processing to produce packet messages 
on its own. An FEF could offload that work oi doing this level 3 network protocol formatting, This would also 
be the case where the EUS wishes to be independent of the internal network message size, or where it has 
a diverse set of network applications with a great variation in transmission stee, 

Depending on the hardware architecture of the EUS, and the level of performance desired, there is the 
Choice between programmed I/O and DMA to move data between EUS memory and the network interface. 
In the programmed VQ approach, probably both control and data will move over the same physical path. In 
the DMA approach there will be some kind of shared memory interface to move controi information In an 
EUS interfacing protocol, and a DMA controiler in the EUS Interface to move data between buffer memory 
and EUS memory over the EUS system bus without using EUS processor cycles. 

There are several alternates that exist for the location of EUS buffering for network data. The data 
couid be buffered on a front end processor network controller circuit Board with its. own private memory. 
This memory can be connected to the EUS by busses using DMA transfer or dual ported memory 
accessed via a bus or dual ported memory located on the CPU side of a bus using private busses. The 
application now must access the data. Various techniques are available; some involve mapping the end user 
work space directly to the address space used by the UIM to store the data. Other techniques require the 
operating system to further buffer the data and recopy into the user^s private address space. 

Options exist in writing the driver tevet software In the EUS that is responsible for moving controi and 
data information .over the interface. The driver could aiso implement the EUS interface protocol, processing 
as wMI as just moving bits over the interface. For the driver to still run efficiently the protocol processing in 
the driver might not be very flexible. For more flexibility based on e particular application, the £US interface 
protocol processing couid be moved up to a higher level Closer to the application, more Intelligence could 
be applied to the interface decisions, at the expense of more EUS processing time. The EUS couid 
implement; various interface protocol approaches for delivery of data to and from, the network: prioritization, 
preemption, etc, Network applications that did not require such flexibility couid use a more direct interface 
(o the driver and the network. 

So, there are a variety of choices to be made at different levels in (he system In both the hardware and 
the software. 



implementation Example: SUN Workstation Interface 



To illustrate the EUS dependent portion of the interface we describe one specific interface. The 
interface is to ifie Sun-3 VME bus based workstations manufactured by Sun Microsystems, inc. This is an 
example of a single EUS connected to a single network interface. The EUS aiso allows connection directly 
to its system bus. The UtM hardware is envisioned as a single circuit board that plugs into the VME bus 
system bus.. 

First, there follows a description of the Sun I/O architecture, and then a description of the choices, made 
in designing the interface hardware, the interface protocol, and the connection to new and existing network 
applications software. 



8.3.3.J SUN Workstation I/O Architecture 

The Sun-3's J/0 architecture, b^sed on the VME bus structure and its memory management unit 
fMMU), provides a DMA approach : cabled direct virtual memory access (DVMA), FIG. 17 shows the Sun 
0VMA. QVMA allows devices on the system bus to do DMA directly to Sun processor memory, and aiso 
allow main bus masters to do DMA directly to main bus slaves without-going through processor memory, it 
is called "virtual" because the addresses that a device on the system bus uses to communicate with the 
kernel are virtual addresses similar to those the CPU would use. The DVMA approach makes sure that all 
addresses used by devices on the bus are processed by the MMU, just as if they were virtual, addresses 
generated by the CPU. The slave decoder 512 (FIQ. 18} responds to the lowest megabyte of VME bus 
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address space (OxOGOO 0000 -* OxOOOf fffi irithe 32 bis VME address space) and maps this megabyte into 
the 1 most ''-Significant megabyte of the system virtual address space (OxffO 0000 -* Oxfff ffff In the 28 bit 
virtual address space), (OX means that the subsequent characters are hexadecimal characters.) When the 
driver needs to send the buffer address to the device, il must' strip off the high S bits from the 2$ bit 
5 address, so that tie address that the device puts on the bus will be in the low megabyte (20 bits) of the 
VME address space. 

In FiG. 18, the CPU 500 drives a memory management unit 502, which Is connected ,to a VME bus 504 
grid on board memory 506 that includes a buffer 508, The VME bus communicates with DMA devices 510. 
Other on board bus masters, such as an ETHERNET access chip can also access memory 506 via MMU 

w ■ 502, Thus, devices can only make DVMA transfers in memory buffers that are reserved as DVMA space in 
these low (physical) memory areas. The kernel does however support redundant mapping of physical 
memory pages into muitipfe virtual addresses, in this way, a page of user memory {or kernel memory) can 
be mapped into DVMA space' In such a way that the data appears in (or comes from) the address space of 
the process requesting that operation. The driver uses a routine called mbsetup to set up the kernel page 

w maps to supporfthts direct user space DVMA, 



6.3,3,2 SUN UIM - EUS In terface Approach 

to As mentioned above there are., many options in designing a particular interface. With the Sun-3 
interface* a DMA transfer approach was designed, an interface with FEP capabilities, an interface with high 
performance matching the system bus, and an EUS software flexibility to aiiow various new and existing 
network applications to use the network, FIG. 19 shows an overview of the interface to the Sun-3. 

Tlie Sunn's are systems with potentially many simultaneous processes running in support of the 

2$ window system, and multiple users. The DMA and PEP approachs were chosen to offload the Sun 
processor whne the network transfers are taking place. The UM hardware h envisioned as a single circuit 
.board that plugs inio the VME bus system bus. With the chance to connect directly to the system bus it is 
desirable to attempt the highest performance interface possible. Sun's OVMA provides a means to move 
data efficiently to and from processor memory. There is a DMA controJIer 02 m the UIM {FIG, 4) to move 

30 data from the UIM to EUS memory and data from EUS memory to the UIM over the bus, and there win be a 
shared memory interface to move control information In the host interfacing protocol. The front end 
processor (FEP) approach means that the data from the network is presented to the EUS at a higher level. 
Level 3 protocol processing has been performed and. packets have been linked together into LUWUs, the 
user's natural sized unit of transmission. With the potential variety of network applications that could be 

35 running on the Sun the FEP approach means that EUS software does not have to be tightly coupled to the 
internal- network packet format 

The Sun-3 DVMA architecture will limit the BUS transaction sizes to a maximum of one megabyte. If 
user buffers are not locked in, then kernel buffers would be used, as an intermediate step between the 
device and the user, with the associated performance penalty for the copy operation, if transfers are going 

40 to be made directly to user space, using the "mbsetup" approach, the user's space will be locked into 
memory, not available for swapping, during the whole transfer process. This is a tradeoff; it ties uf> the 
resources in the machine, but it may be more efficient if it avoids a copy operation from some other buffer 
in the kernel 

The Sun system has existing network applications running on ETHERNET, for example their Network 
4$ File System (NFS). To run these existing. applications on MAN but still leave open the possibility for new 
applications that= could use the expanded capabilities of MAN, we needed flexible EUS software and a 
flexible interface protocol to be able to" simultaneously handle a variety oi network applications. 

FIG. 19 is a functfonal overview of the operation and interfaces among the NIM. UtM, and £US. The 
specific EUS shown in this illustrative example is a Sun-3 workstation, but the principles apply to other end 
50 user systems having greater or lesser sophistication. Consider first the direction from the MINT via She NIM 
and UiM to the BUS, As shown in FIG, 4, data thai Is received from MINT 11 over link 3 is distributed to 
one of a plurality of U!M$ 13 over links 14 and is stored in receive buffer memory &0 of such a UIM, from 
which data is transmitted in a pipelined fashion over an EUS bus 92 having a DMA interface to the 
appropriate EUS* The control structure for accomplishing this transfer of data is shown in FIG. 19, which 
55 shows that the input from the MINT is controlled by a MINT to NIM link handler 520, which transmits its 
output under the control of router 522 to one of a plurality of HIM to UIM link handlers (N/U LH) 524, 
MINtyNIM link handler (M/N LH) 520 supports a variant on the Meirobus physical layer protocol. The NIM to 
UIM fink handler 524 also supports the Metrobus physical layer protocol: in this implementation but other 
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protocols could be supported as well. It is possible thai different protocols could coexist' on the same NiM. 
The output of the N/U LH 524 is sent over a link 14 to a U1M 13, where it ts buffered In receive buffer 
memory 80 by NIM/UiM link Handier 552, The buffer address Is supplied by memory manager 560, which 
manages free and allotted packet Duffer lists. The status of the packs* reception is obtained by N/U LH 

s 552, which computes and verifies the checksum over header an data, and outputs the status information to 
receive packet, handler 558, which pairs the status with the buffer address received from memory manager 
550 and queues the information on a received packet list, information about received packets is then 
transferred to receive queue manager' 558, which assembles packet information into queues per LUWU and 
SUWU, and which also keeps a queue of LUWUs and SUWUs about which the EUS has not yet been 

jo notified. Receive queue manager 558 is polish for information about LUWUs and SUWUs by the EUS via 
the EUS/UIM fink handier {E/U LH} 540, and responds with notification messages via UMEUS link handler 
(U/E LH) 582. Massages which, notify the EUS of the reception of a SUWU also contain the data for the 
SUWU, thus completing the reception process, in the case of a LUWU, however, the EUS allocates its 
memory for reception, and issues a receive request via E/U LH 540 to receive request handier 560, which 

is formulates a receive workfist and sends it to resource manager 554, which controls the hardware and 
effects the data transfer over EUS bus 92 (FIG, 4) via a DMA arrangement. Note that the receive request 
from the EUS need not be for the entire amount of data in the LUWU; indeed, all of the data may not have 
even arrived at the UIM when the EUS makes its first receive request When subsequent data for this 
LUWU arrives, the EUS will again be notified and will: have an opportunity to make additional receive 

■20 requests. In this fashion, the reception of the data is pipelined as much as possible In order to reduce 
latency,- Following data transfer, receive request handler 560 informs the EUS via U/E LH 562, and. directs 
memory manager 550 to de-allocate the memory for that portion of the LUWU that was delivered, thus 
making that memory available for new incoming data- 
in the reverse direction, i.e., from EUS 2B to MINT 11. the operation is controlled as follows: driver 570 

25 of EUS 26 sends a transmit request to transmit request handler 542 via U/E LH 552. in the case of a 
SUWU, the transmit .request itself contains the data to be transmitted,- and transmit request handler 542 
sends this data in a transmit worklist to resource manager 554, which computes the packet header and 
writes both header and data into buffer 15 {FIG. 4), from which is is transmitted to NIM 2 by UiM/NiM link 
handler 546 when authorized to do so via the flow control protocol in force on link 14, The packet is 

30 received at Ntt/I 2 by UiM/NIM link handier 530 and stored in buffer 54, Arbiter 532 then selects among a 
plurality of buffers 94 in NIM 2 to select the next packet or SUWU to be transmitted under the control of 
HIM/MINT link ft&ndter 534 on MINT link 3 to MINT 11- in the case of a LUWU, transmit request handler 
542 decomposes the request into packets and sends a transmit worklist to resource manager 554, which, 
fpr each packet, formulates the header, writes the Header into buffer 15, controls the hardware to effect the 

35 transfer of the packet data over EUS: bus 92 via DMA, and directs U/N LH 549 to transmit the packet when 
authorized to do so. The transmission process is then as described for the SUWU esse* In either case, 
transmit request handler 542 is notified by resource manager 554 when transmission of the SUWU or 
LUWU' is complete, whereupon driver 570 is notified via U/E LH 562 and may release its transmit buffers if 
desired, 

40 RG. 19" also shows details of the internal software structure of EUS 36, Two types of arrangements are 
shown, in one of which blocks 572, 574, 576, 573, $30 the user system performs level 3 and ; higher 
functions, Shown in FIG. 19 Is an implementation based on Network of the Advances Research Projects 
Administration of the U.S. Department of Defense (ARPAnet) protocols including an internet protocol 580 
(level 3), transmission control protocol (TCP) and user datagram protocol (UDP} block 578 (TCP being used 

45 for connection oriented service and UDP being arranged for connectionless service). At higher levels are the 
remove procedure call (block 575), the network file server (block 574) and the user programs 572. 
Alternatively, the -services of the MAN network can be directly invoked by user [block 562) programs which 
directly interface with driver 570 as indicated by the null block 584 between the user and the driver, 

&3.3»3 EUS Interface Functions 

The main functional parts of the iransmit EUS interface are a control interface with the EUS, and a DMA 
interface to transfer data between theEUS and the UiM over the system bus. When transmitting into the 
55 network, control information is received that describes a LUWU or SUWUs to be transmitted and Information 
about the EUS buffers where the data resides. The control information from the BUS includes destination 
MAN address, destination group {virtual network), LUWU length, and type fields for type of service and 
higher ievei protocol type. The DMA interface moves the user data over from the EUS buffers into the UIM, 



46 



E..P 0 335 BB2 A2 



Tht* network interface portion is responsible for formatting the LUWUs and SUWUs into packets and 
iransmttting -the packets or* the link to the network. The control interface could have several variations for 
flow control, multiple outstanding recasts,; .priority, and preemption. The UIM is in control of the amount of 
data that it takes from the EUS memory and sends into the network. 

On the receive side, the EUS polls for information about packets that have been received and the' 
control interface* responds with LUWU "information from the packets header and current information about 
how much of the EUS transaction has arrived. Over the control interface the EUS requests to receive data 
from these messages, and the DMA interface will send the data from memory on the UIM into the EUS 
memory buffers. The poll and "response mechanism in the interface protocol on the receive side allows a lot 
of EUS flexibility for receiving data from the network. The BUS can receive either partial or entire 
iransactions.that have come from the source EUS. It also provides the flow control mechanism for the EUS 
on receive, EUS is in control of what it receives* when it receives it .and in what order. 



is 8,3.3.4 SUN Software 



This section describes how a typical end user system, a SUN-3 workstation, is connectable to MAN, 
Other end user systems would use different software. The Interface to MAN is relatively straightforward and 
efficient for a number of systems which have been studied. 



20 



0,3,3.4,1 Existing Network Software 

The Sun UNIX® operating system is derived from the 4,2BSD UNIX system from the University of 

26 California at Berkeley. Like 4.2BSD it contains as part of the kernel, an implementation of lbs ARFAnet 
protocols: internet protocol (IP), transmission control protocol (TCP) for connection-oriented service on top 
of IP, and user datagram protocol (UOP) for conriectidnl.sss service on top of IP,: Current Sun systems use 
IP as an internet sublayer in the top half of the network. layer The bottom half of the network layer is a 
network specific, sublayer. It currently consists of driver level software that interfaces to a specific network 

30 hardware connection, namely an ETHERNET controller, where the link layer MAC protocol is implemented. 
ETHERNET is the network currently used to connect Sun workstations. To connect Sun .workstations with a 
MAN network, it Is necessary to fit into the framework of this existing networking software- The software for 
the MAN network Interface in the Sun m\\. be driver level software, 

The MAN netwgrk is naturally a connectionless or datagram type of network, LUWU data with control 

o$ information forms the EUS transaction crossing *ha interface into the network, listing network services can 
be provided using the MAN network datagram LUWUs as a basis. Software in. the. Sun will build up both 
connectionless and connsction-oriented transport and application senses on top of a MAN datagram 
network layer. Since the Sun already has a variety of network application software, the MAN driver will 
provide a basic service with: the futility to multiplex multiple upper layers. This multiplexing capability will 

40 be necessary not just, for existing applications but for additional new applications that will use MAN'S power 

more directly* ■= , 

There needs to be an address translation service function in the EUS at the driver levei in the host 
software. It would allow for IF* addresses to be translated into MAN addresses. The address translation 
service i$ similar in function to the current Sun address resolution protocol (ARP), but different in 
45 implementation. « a particular EUS needs to update its address translation tables, it sends a network 
message with an, IF address to a well known address translation server. The corresponding MAW address 
will be returned. With a set of such address translation services, MAN can then act as the underlying 
network for many different, new and existing, network software services in the Sun environment. 
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8.3.3.4,2 Device Driver 

On the top side, the driver multiplexes several different queues of LUWUs from the higher protocols 
and applications for transmission and queues up received LUWUs in several different queues for the higher 
layers. On the hardware side, the driver sets up DMA transfers to and from user memory buffers. The driver 
must communicate with the system to map user buffers into memory that can be accessed by the DMA 
controller over the main system bus, 

On transmit, the driver must do address translation on the outgoing LUWUs for those protocol layers 
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that are not using MAN addresses, i.e., the ARPAnet protocols. The MAN destination address, and 
destination group is Included in MAN datagram control information that is sent when a LUWU is to be 
transmitted, Other transmit control information -will be LUWU length, fields indicating type of service and 
higher level protocol, along with the data location for DMA* The UiM uses this control information to form 

e packet headers end to move the LUWU data out of EUS memory. 

On receive, the driver will implement a poll/response protocol with the DIM notifying the EUS of 
incoming data. The poll response will contain control information that gives source address, total LUWU 
length, amount of "data that has arrived up to this point, the type fields indicating higher protocol foyers, and 
seme agreed on amount of thB data From the message. (For smalt messages* the whole user message 

w couid arrive in ihis poll response.) The driver itself has the flexibility based on the type field to decide how 
to receive this message and which higher level entity to pass it on up to. It. may be, that based, on a certain 
type field, it may just deliver the announcement, and pass the .reception decision on up to a higher layer. 
Which ever approach is used, eventually a control request for the delivery of the data from the UM to the 
EUS memory is made* which results in a DMA operation by the UIM. EUS buffers to receive the data may 

i$ preallocaied for the protocol types where the driver handles the reception in a fixed fashion, or the driver 
may : have to get buffer information . from a higher layer in the case where it has just passed the 
announcement on op. This is the type of flexibility we need in the driver to handle both existing and new 
applications in the 3un environment 

20 

8.3,3.4.3 Raw MAN Interface Software 



Later, as applications are written that wish to directly use the capabilities of the MAN network, ihe 
address translation function will not be necessary. The MAN datagram control information will be specified 
as directly by special MAN network layer software. 



9 MAN Protocols 



30 

9.1 Overview 

The MAN protocol provides for the delivery of user data from source UIM across the network to 
destination UIM. The protocol is connectionless > asymmetric for receive and send, implements error 
3S detection without correction, and discards layer purity for high performance. 



3,2 Message Scenario 

40 The EUS sends datagram transactions called LUWUs into the network. The data that comes from the 
EUS resides in EUS memory. A control message from the EUS specifies to .the UIM the data length* the 
destination address for this LUWU, the destination group and a type field which could contain information 
like ; the user protocol and the. network class of service required. Together, the data and the control 
information form the LUWU. Depending on the type of EUS interface, this data and control cart be passed 

45 to the UIM in different ways, but it is likely that the data is passed in a DMA transfer. 

The UIM will transmit this LUWU into the network. To reduce potential delay, .larger LUWUs are not sent 
into the network as one contiguous stream. The UIM breaks up the LUWU into fragments called packets 
that can be up to a certain maximum stee. An UWU smaller ihan the maximum size is called a SUWU and 
win be contained in a single packet Several EUSs are concentrated at the NIM and packets are transmitted 

so over the Ifnfc from the UIM to the NIM- {the EUSL). Packets from one UIM can be demand multiplexed on 
She link from the NIM to the MINT (the XL) with packets from other EUSs, Delays are reduced because no 
EUS has to watt for the completion of a long LUWU from another BUS sharing the link to the MINT. The 
UIM generates a header for every packet that contains information from the original LUWU transaction, so 
that each packet can pass through ihe network from source UIM to destination UIM and be recombined into 

$$ the same LUWU that was passed into the network by the source EUS, The packet header contains the 
information for the network layer protocol in the MAN network. 

Before the HM sends the packet to the MiNT^on the XL, it adds a NIM/MtNT header to the packet 
message. The header contains the source port number identifying the physical port on the NIM where a 
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particular EUS/U1M is connected. This header is used by the MINT to verify that the source EUS is located 
at the port where he is authorized to be, This type of additional check is especially important for a data 
network that serves one or more virtual networks, to ensure privacy for such virtual networks, The MINT 
uses the packet header to determine the route for the packet, as well as other potential services. The MINT 

5 does not change the contents of the packet header. When the *LH in the MINT passes the packet out 
through the switch to be sent out on the XL to the destination NiM, it places a different port number In the 
N!Mflrt!NT header. This port number is the physical port on the NIM where the destination EUS/UiM is 
connected. The destination NJM uses this port number to route the packet on the fly to the proper EUSL. 
The various sections of $ packet are identified by delimiters according to the link format, Such 

?e delimiters occur between the NIM/M1NT header 600 and She MAN header 610, and between the MAN 
header and the rest of the packet. The delimiter at the MAN header/rest of packet border is required to 
signal the header 'check sequence circuit to insert or .check the header check. The NIM broadcast$ a 
received packet to ail ports: in the NfM/MINT header field. 

When the packet arrives at the destination UIM, She packet header contains the original information from 

ts the source UIM nece$sdry to reassemble the source EUS transaction. There is also; enough information to 
allow a variety of EUS receive Interface approaches Including pipelining or other variations of EUS 
transaction size, prioritization, and preemption. 

so 9.3 MAN Protocol Description 



9.3.1 Link Layer Functions 

2$ The iink functions are described in Section 5. Th'$ functions of message beginning and end demarca- 
tion, data transparency, and message check sequences on the. EUSL and XL links .are discussed there. 

A check sequence for the whole packet message is performed at the link level but Instead of corrective 
action, being taken, there, an indication of the error is passed on up to the network layer for handling there, A 
message check sequence error results only in incrementing an error count for administrative purposes, but 

so the message transmission continues, A separate header check sequence is calculated In hardware' in the 
UIM, A header check sequence error detected by the MINT control results in. the message being thrown 
away and arv error count being incremented for administrative purposes, At the destination UiM a header 
check sequence error also results in tha message being thrown away. The data check sequence result can 
be conveyed to the EUS as part of the LUWU arrival notification, and the EUS can determine whether of not 

35 to receive the message. These violations of layer purity have been made to simplify the processing at the 
iink layer to increase speed and overati network performance. 

Other "standard" Hnk layer functions like error correction and flow control are not performed in the 
conventional manner. There are no acknowledgement messages returned at the link level for error 
correction {retransmission requests) or for flow control How control is signaled using special bits in the 

40 framing pattern, The complexity of X.25^ke protocols at the link fevei can be tolerated for low speed links 
where tha processing overhead' will not reduce performance and does increase, !he reliability of links that 
have high ^rror rates. However; it Is felt that an acceptable level of error-free throughput will be achieved by 
the low bit error rates in the fiber optic finks in this network (Bit Error Rate less ihan 10 errors per trillion 
bits,} Also, because of the large amounts of buffer memory in the MINT and the UiM necessary to handle 

45 data from the high-speed links* it was felt that flow control messages would not be necessary or effective. 



6.3*2 Network Layer 
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$.3.2.1 Functions 

The message unit that leaves the source UIM and travels all the way to the destination UiM is the 
packet. The packet is not altered once it leaves the source UiM, 
55 The information in the UIM to UIM message header will allow the following functions to be performed: 
v fragmentation of LUWUs at the -.source UiM, 

- recombination of LUWUs at the destination UIM, 

- routing to the proper NIM at the MINT; 
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- routing to the proper UMEUS port at She destination NiM, 

* MINT transmission of length messages {e.g., SUWU, packet, n packets), 

" destination UM congestion control and arrival announcement, 

- detection and handling of message header errors* 

5 * addressing of network entities for internal network messages, 

- EOS authentication for delivery of network services onfylo authorized users. 



9,32.2 Format 

10 

F!G. 20 shows the UIM to MINT Message format The MAN header $10 consists of the Destination 
Address 612, the Source Address 61 4> the group (virtual network) identifier 616, group name 618, the type 
of service 6£0, the Packet Length {the header plus data in bytes) 622, a type of service indicator 623, a 
protocol identifier 624 for use by end user systems for identifying the contents of EUS to EUS header 630, 

75 and the Header Check Sequence 626. The header is of fixed length, seven 32-bit words or 224 bits long. 
The MAN header is followed by an EUS to EUS header 630 to process message fragmentation. This 
header includes a LUWU identifier 632, a LUWU length indicator 634, the packet sequence number 636, the 
protocol identifier 83S for identifying the contents of the internal EUS protocol which is the .header of user 
data 640, and the number 639 of the initial byte of data of this packet within the tote*' LUWU of information, 

20 FInaBy, user data 640. may be preceded for appropriate user protocols by the identity of the: destination port 
642 and source port f 844. The fields are 32 bits because that is the most efficient length (integers) for 
present network control processors. Error checking is performed on the header in control software; this is 
the Header Check Sequence, At the link level, error checking done over the whole message; this is the 
Message Check Sequence--634. The NIM/MiNT header 600 (explained below) is also shown in the figure for 

25 completeness. 

The destination address, group identification, .type of service, and the source addressers placed as the 
first five fields in the message for efficiency in MINT processing. The destination and group identification, 
are used for routing, the size for memory management, the type fields for special processing, and the 
source is used for service authentication. 

30 .,. 

9,3,2.2,1 Destination Address 

The Destination Address 612 is a MAN address that specifies, to which EUS the packet is being sent A 
35 MAN address is 32 bits long and is a fiat address that specifies an EUS connected to the network, (fn 
internal network messages, if the high order bit In the MAN address Is set, the address specifies an internal 
network entity like a MINT or NIM, instead of an EUS,}. A MAN address will be permanently assigned to an 
EUS end will identify an EUS even if it moves to different physical location on the network. If an EUS 
moves, it. must sign in with a weil-kiwwn- routing authentication server to update the correspondence 
40 between its- MAN address and the physical port on which it is located. Of course, tha port number is 
supplied by the NtM so the EUS cannot cheat about where it is located. 

In the MINT the destination address will be used to determine a destination NIM for routing the 
message. In the destination NIM the : destination address will.be used to determine a destination UIM for 
routinQ Uie message, 

3.3-2.2,2 Packet Length 

The Packet Length 622 is 16 bits long and represents the length in frytes of this message; fragment 
50 including the fix$d length header and the data. This length is used by the MINT for transmitting the 
message, ft is also used by the destination UIM to determine the amount of data available for delivery to the 
SUS- 



55 S,3,2.2.3 Type Fields 

The type of service field 623 is 16 bits long and contains the type of service specified in the original 
EUS request The MINT may look at the type of service and handle the message different^ The 
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, «. iiiy m „ v S kn kmfc at the Woe of service to determine how to deliver the message to the 

sees St?:: s« r. ^ ^ ea, ^ - EUS ^ 

various steams of data from the network. 

^ZZA Packet Sequence Number 

This is a Packet Sequence Number 636 for this particular LUWU transmission.. It helps the receiving 
U-M ombna the incoming LUWU, so that it can determine if any fronts 0 hja 
ta M because of error. The sequence number is incremented for each fragment of the LUWU. The fast 
nS^ nitwe to indicate the last packet of a LUWU, (An SUWU would have-1 as ** 
2 ™e Zber.) if an infinite .ength LUWU is being sent, the PacM Sequence should wrap 

around. (See UWU Length, Section 9.3.2,2.7. for an explanation of an infimto Sength LUWU.) 

Source A&Jrjrss 

The Source Address 614 Is 32 bite long and is a MAN address that specifies the EUS that sent the 
mesial (See Destination Address for an emanation of MAN address.) The Source Address will be 
S3t. S MSr network accounting. Coupied with the Port Number 600 from the NIM/MiNT ^ 
Eta used by the MINT to authenticate the source EUS for network services. The Source Address will be 
de Lered to the destination EUS so that it knows the network address of the EUS that sent the message. 



as 9.32.2.6 UWU iO- 
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The UWU ID 632 is a 32 bit number that is used by She destination UIM to recombine a UWU. Note that 
the recombination joins made easier because fragments cannot get out of order in the ™^Jhe UWU 
ID along with the Source and Destination Addresses, identifies packets of the same LUWU; or ,n other 
words, fragments of the original datagram transaction. The ID must be unique for the source and dssUnabon 
pair for the time that any fragment is in the network. 



9.3.2.2.7 UWU Length 



as 



to 



The UWU Length 634 is 32 bits long and represents the total length of UWU data in bytes. In the first 
packet of a LUWU this will allow the destination UiM to do congestion controf, and if the LUWU is ppsfened 
into the EUS, it will allow the UIM to beoin a LUWU announcement and delivery before the complete LUWU 

arriv&s at the UiM, , . , . . ^ 

A Length that is negative .indicates an infinite length LUWU, which Is like an open channel between two 
EUSs Closing down an infinite length LUWU is done by sending a negative Packet Sequence Number. An 
infinite length LUWU only makes sense where the UIM controls the DMA Into EUS memory- 

45 9,3. 2,2,8 Header Check Sequenca 

There is a header check sequence 626, calculated by the transmitting UIM for header informations 
that the MINT and the destination UiM can determine If the header information was recerved correctly. The 
MINT or the destination UIM will not attempt delivery of a packet with a header check sequence error. 
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9.3.2.2.9 User Data 



The user data 640 is the portion of the usar UWU data that is transmitted in this fragment of the 
transmissiorl , F0i | 0wil1 g the data is the overall message check sequence 646 calculated at the (ink level. 



9.3.3 NIM/MiNT Layer 
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9.3,3.1 Functions 

This protocol layer consists of a header containing a NIM port number 600, The port number has a one 
to one correspondence to an EUS connection on the NiM and is prepended by the NIM in block 403 {FIG. 
16) so that the user cannot enter false data therein. This header is positioned at the front of a packet 
message and ss not covered by the ovsrati packet message check sequence it is checked by a group of 
parity bits in, the same word to enhance its error reliability. The incoming message to the MINT contains the 
source NIM port number to assist, in user authentication for network services that might be requested in the 
type fields. The outgoing message from the MINT contains the destination NiM port number in place of the 
source port 600 in order to speed the demultiplexing/routing by the NiM to the proper destination EUS, If 
the packet has a plurality of destination ports in one NiM, a list of these ports is placed at *he beginning of 
the packet so that section 600 of the header becomes several words long. 



tO LOGIN PROCEDURES AND VIRTUAL NETWORKS 



10,1 ggnsraj 

A system such as MAN is naturally most cost effective when it can serve a large number of customers. 
Such a large number of customers is likely to include a number of sets of users who require protection, from 
outsiders. Such users can conveniently be grouped into virtual, networks. In order to provide still further 
flexibility and protection, individual users may be given access to a number of virtual networks. For 
example, alt She users of one company may b» on one virtual network and the payroll department of that 
company may be on a separate virtual network The payroll department users should belong to both of 
these virtual neSworks since they may need access to general data about the corporation but the users 
outside the payroll department should not be members of the virtual network of the payroll department 
virtual network since they should not have access to payroll records. 

The login procedure' method of source checking and the method of routing- are the arrangements which 
permit, the MAN system to support a larga number of virtual networks while providing an optimum level of 
protection against unauthorized data access, Further, the arrangement whereby the NiM prepends the user 
port to every packet* gives additional protection against access of a virtual network by an unauthorized user 
by preventing aliasing* 



1 02. Bunding Up the Authorization Data Base 

FIG. 15 illustrates the administrative control of the MAN network, A data base is stored f.n disk 351 
accessed via operation, administration, and maintenance (OA&M) system 3SQ for authorizing users in 
response to a login request. For a. large MAN network, OA&M system 350 may be a distributed 
multiprocessor arrangement for handling a iarge volume of login requests. This data base is arranged, so 
that users cannot access restricted virtual networks of which they are not members. The data base is under 
the control of three types of super users, A first super user who would in general oe an employee of : the 
common carrier that is supplying. MAN service. This super user, referred to for convenienc$ herein as a 
level 1 super user, assigns a block of MAN names which would in general consist of a block of numbers to 
each, user group and assigns type 2 and type 3 super users to particular ones of these names. The level t 
supor user also assigns virtual networks to particular MAN groups. Finally, a level t super user has the 
.authority to create or destroy a MAN supplied ssivice such as electronic "yellow page" service, A type 2 
super user assigns valid MAN names from the block assigned to the particular user community, and 
assigns physical port access restrictions where appropriate, to addition, a type Z super user has the 
authority to restrict access to certain virtual networks by sets of members of his customer community. 

Type. 3 super users who am broadly equal in authority to type 2 super users* have tho authority to grant 
MAN names access to their virtual networks. Note that such access can only bo granted by a type 3 super 
user tf the MAN name's type 2 super user has allowed this MAN name user the capability of joining this 
group by an appropriate entry In table 370, 

The data base includes table 360: which: provides' for each user identification 362, the password 361 , the 
group 363 accessible using that password, a fist of ports and, for special cases, directory numbers 364 from 
wfrich.that user may transmit and/or receive, and the type of service 365, i.e., receive only, transmit on.iy, or 
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receive and transit 

The data base afso includes user-capability tabfes 370,375 for relating users (table 370} to groups (table 
375) potentially aulhorizable for each user. When a user 1$ to be authorised by a super user to access a 
group, this table is checked to see if that group is in the list of table 370; if not the request to authorize that 
$ user for thai group will, be rejected. Super users hay© authority to enter data for their group and their groups 
In tables 370,375; Super users also have the authority for their user to' move a group from table 375 Into the 
fist of groups 363 of the user/group authorization tabie 360 t Thus, for a user to access an outside group, 
super users from both groups would have to authorize Bite access, 

10 

10,3 Login Procedure 

At login time, , a user who has previously been appropriately authorised according to the arrangements 
described above, sends an initial login request message to the MAN network. This message is destined not 

i$ for any ; :Other user, but fof the MAN network itself. Effectively, this message & a header only message which 
is analyzed by the MINT central; control The password, type of login service being requested, MAN group, 
MAN name and port number are ail in the MAN header of a login request, replacing other fields. This is 
done because only the header is passed by the XLH to the MINT central control, for further processing by 
the OA&M central control. The login data which Includes the MAN name, the requested MAN group name 

zo (virtual network name), and the password are compared against the login authorization data base 351 to 
check whether the particular user is authorised to access that, virtual network From the physical port to which 
that user is connected (the physical, port was prepended by the NtM prior to reception of the login packet 
by the MINT), if the user is in fact properly authorized, then the tables in source checker 307 and in router 
309 (FIG. 14) are updated Only the source checker table of the checker that processes the login user's port 

& is updated from a login for terminal operations. If a login request is for receive functions, then the routing 
tables of alt MINTs must be updated to aliow that source to receive data from any authorized connectabie 
user of the same grpup who may be connected to other MINTs to respond to requests, The source checker 
table 308 includes a list of authorized name/group pairs for each .port connected to the N!M that. sends the 
data, stream to the XLH for that source checker. The router tables 310, all include entries for all users 

$0 authorized to receive UWUs, Each entry includes a name/group pair, and the corresponding N1M and port 
number.. The entries in the source checker list are grouped by group identification numbers. The group 
identification number 616 is part of the header of subsequent packets from the togged, in user, and Is 
derived by the GA&M system 3S0 : at login time and sent back by the QA&M system via the MAN switch 10 
to the login user. The OA&M system 350 uses toe MINT central control's 20 access 19 to the MINT 

35 memory 18 to enter the login acknowledge to the login user. On subsequent packets, as they are received 
iff the MINT, the source checker checks the port number, MAN name and MAN group against the 
authorization table in the source checker with the result that the packet is allowed to proceed or not. The 
router then checks to see if the destination is an allowable destination for that input by checking the virtual 
network group name and the destination name. As a result once a user is logged in;- the user can reach 

40 any destination that is in the routing tables, Le M that has previously logged in for access in the read only 
mode or the read/write mode, and that has the same virtual network group name as requested, ih : the login; 
in contrast unauthorised users are blocked in every packet. 

While" in the present embodiment, the checking is done for each packet, it could also be done for each 
user w.ork unit (LUWtf or SUWU), with a recorded indication that alt subsequent packets of a LUWU whose 
original packet was rejected are aiso to be rejected, or by rejecting all LUWUs whose initial packet is 
missing at the user system. 

Those super user logins' which are associated wfth making changes in the login data base are checked 
in the same way as conventional logins except that it is recognized in OA&M system 350 as a , login request 
for a user who has authority for changing the data base stored on disk 351- 

$o Super users types 2 and 3 get access to the OA&M system 350 from a computer connected to a user 
port of MAN, OA&M system 350 derives statistics on billing, usage, authorisations and performance which 
the super users can access from their computers. 

The MAN network can aiso serve special types of users such as transmit only users and receive only 
users. An example of a transmit only user is a broadcast, stock quotation system or a "video transmitter, 

55 Outputs of transmit oniy users are oniy checked in- source checker tables, Receive only units such as 
printers or monitoring devices are authorised by entries in the routing tables. 
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11 APPLICATION OF MAN TO V OICE SWITCHING 

HG.. 22 shows $n arrangement for using the MAN architecture to switch voice as well as data In order 
to simplify the application of this architecture to such services, an existing switch in this case, the 

s switch manufactured by AT&T Network Systems, te used. The advantage of using an existing switch is that 
it avoids the necessity for developing a program to control a local switch, a very large development 
By using an existing switch as the interface between the MAN and voice users, this effort can bs almost 
completely eliminated. Shown on FIG. 22 is a conventional customer telephone connected to a switching 
modute 1207 of 5ESS switch 1200, This customer telephone could also be a combined integrated services 

10 digital network (ISDN) voice and data customer station which can also be connected to a 5ESS switch. 
Other customer stations 1202 are connected through a subscriber loop carrier system 1203 which is 
connected to a switching module 1207, The switching modules 1207 are connected to a time multiplex 
switch 1209 whtch sets up connections between switching modules. Two .of these switching modules are 
shown connected to an interface 1210 comprising Common Channel Signaling 7 {CCS 7) signaling channels 

i$ 1211, puf$e code modulation (PCM) channels 1213, an special signaling, channels 1215. These are 
connected to a packet assembler and disassembler 1217 for Interfacing with an MAN NiM 2. The function 
of the PAD is to interface between the PCM signals which are generated In the switch and the packet, 
signals which are switched in the MAN network. The function of the special signaling channel 121$ is to 
inform PAD 1217 of the source and destination associated with each PCM channel The CCS 7 channels 

bo transmit packets which require further proces$ing by PAD 1217 to get them into the form necessary for 
switching by the MAN network- To make the system less vulnerable against the failure of equipment or 
transmission facilities, the switch 1$ shown as being connected to two different NIMs of the MAN network. A 
digital PBX 1219 also interfaces with packet assembler disassembler 1217 directly. In a subsequent 
upgrade of the PAD, it would be possible to Interface directly with SLC 1203 or with telephones such as 

25 integrated services digital network (ISDN) telephones that generate a digital voice bit stream directly. 

The NIMs are connected to a MAN Hub 1230, The NiMs are connected to MINTs 11 of that hub. The 
MINTs 11 are interconnected by MAN switch 22. 

For this type of configuration, It is desirable to switch substantial quantities of data as well as voice in 
order to utilize the capabilities of the MAN hub most effectively. Voice packets, rn, particular, have very short 

30 delay requirements in order to minimize the total delay encountered in JransmittJng speech from a source to 
a destination .and in order to ensure that there is no substantial interpacket gap which would result in the 
loss of a portion of the speech signal. 

The basic design parameters for MAN have been selected to optimise data switching, and have been 
adapted in a most straightforward manner as shown in FIG, 22. If .$ large amount of voice packet switching 

3$ is required, one or more of the following additional' steps can be taken: 

1,, A form of coding such as adaptive differential POM (ADPCM) which offers excellent performance 
at 32 Kbit/second could be used instead of 64 Kbit PCM, Excellent coding schemes are also available 
which require- fewer than 32 Kbit/sec, for gfcod performance, 

2. fac.jcet$ need only be sent when a customer is actually speaking. This reduces the number of 
4Q packets that must be sent by at least 2:1 , 

3. The size of the buffer for buffering voice samples could be increased, above the storage for 256 
voice samples (a two packet buffer) per channel, Howsver, longer voice packets Introduce more delay 
which: may or may not be tolerable depending on the characteristics of trie-rest of the voice network, 

4- Voice traffic might be concentrated in specialist MINTs to reduce the number of switch setup 
4B operaiions for voice packets. Such an arrangement may enlarge the number of cu$tomer$ affected by a 
failure of a HM or MINT and might require arrangement$ for providing alternate paths to anoiher NIM 
and/or MINT. 

5. Alternate hub configurations can be used. 

so Tile alternate hub configuration of FIG. 24 is an example of a step 8 solution. A basic problem of 
switching voice packets is that In order to minimize delay in transmitting voice, the voic£ packets must 
represent only a short segment of speech* as low as 20 milliseconds according to some estimates. This 
corresponds to as many as 50 packets per second for each direction of speech. If a substantial fraction of 
the input to- a MINT represented such voice packets, the circuit switch setup time might be too great to 

55 handle such traffic. If only voice traffic were being switched, a packet switch which would not require circuit 
setup operations might be needed for high traffic situations. 

One embodiment of such a packet switch 1300 comprises a group of MINTs 1313 interconnected like a 
conventional array of space division switches wherein each MINT 1313 is connected to four others^ and 



54 



EP 0 335 562 A2 



enough stages are added to reach all output MINTS 1312 that carry heavy voice traffic. For added 
protect against equipment failure, the MiNTs 1313 of the packet switch 1300 could be interceded 
through MANS 10 in order to route traffic around a defective Mi NT 1313 and to use a spare MINT 1313 
instead. 

The output bit stream of NIM 2 is connected to one of the inputs (XL) of art input MINT 1.311. The 
packet data traffic leaving Input MINT 1311 can continue to be switched through MANS 10. In thts 
embodiment, the data packet output of MANS 10 is merged with the voice packet output of data switch 
1300 in an output MINT 1312 which receives the outputs of MAN3 10 and date switch 1300 on the XL 16 
(input) side and whose \l 17 output is the input bit stream of NIM Z t produced by a PASO circuit 290 {FIG. 
13). Input MINT 1311 doss not contain the PASC circuit 290 (FIG, 13) for generating the output bit stream 
to NIM 2. For output MINT 1312 the inputs to the XLs from MANS 10 pass through a phase alignment 
circuit 292 (FIG. 13) such as that shown In FJ& 23,' since such Inputs come from many different sources 
through circuit paths that Insert different delay. 

This arrangement can also be used for switching high priority data packets through the packet switch 
1$00 white retaining the circuit switch fO for switching iow priority data packets. With this arrangement it is. 
not necessary to connect the packet switch 1300 to output MINTs 1312 carrying no voice traffic: in that 
case, high priority packets to MiNTs carrying no voice traffic would have to be routed through circuit switch 
MANS 10* 

FSQ, 26 shows another alternate configuration; in this configuration, while data packets are switched 
once through the circuit switch as previously described, voice packets are switched twice through the space 
division switch. In 2ft the MiNTs 11 are broken down into two groups. The first group consisting of 
MINT 11-0 through MINT 11-239 are used in the conventional way and have both voice and data packet 
inputs from the NiMs to which they are connected by a link 3, When one of the MINTs n*ft_,l 1-230 
recognizes a voice packet, it prepares to send that voice packet through the circuit switch MANS 10 to one 
of 16 specialist voice packet switch modules, MiNTs 1 1*240^,1 1-2S5. Each of the MINTs 11 -0^,1 1-239 
can then assemble voice packets in only 18 different groups, one group,.for each of the voice packet 
switching modules, MINTs 11-240...J 1*255, so that any circuit connection from one of the MINTs 11- 

0, -11-239 can carry voice packets destined for 1/161H of the 960 NiMs connected to the £40 voice and 
data packet switch modules, 

A voice packet or a chained series of voice packets destined for one of the voice packet switch 
modules, MINTs 11-240,„>,1 1-255, is connected from the output of MANS 10 to an input of such a MINT. 
The voice packet switch MINT then separates each incoming packet stream into 15 possible destinations 
and assembles voice packets received from any of the voice and data packet switch modules, MiNTs 11- 
for each of the 15 destinations (NtMs) served by each of the votes packet switch modules, 

MINTs 11-240 11*255; Each of the latter MINTs then transmits a chain of packets for each of the 15 NIMs 

served by that MINT through MANS 10 to the one of the outlets of MANS 10 that is connected to the 
correct destination NtM, 

This arrangement sharply reduce? the number of connections that must be set up through MANS 10 for 
transmitting voice packeis since each voice and data packet MINT has only -16 voice packet destinations 
(MINTs 11-240,^,11*255) and aach voice packet switch MINT,. 11-240 11*255 t has only 15 destinations, 

1. e., the 15 NIMs that it serves. Tnis is in contrast to a comparable single stage arrangement whereby each 
voice and data packet switch module must set up connections to up to 960 different NiMs. 



1^ Mitil CONTROL TO MAN BWTTCH CONTROL 

R6, 2i illustrates one arrangement for controlling access by MiNTs 11 to the MAN switch control 22, 
Each MiNT has an associated access controller 1120, A data ring 1102,104,1106 distributes ciata indicating 
the availability of output links to e£Ch logic and count circuit 1110 of each access controller. Each access 
controller 1120 maintains a list 11-10 of output links such as 1112 to which it wants to send data, each (ink 
hayjnoj an associated priority indicator l-t?14. A MINT ce.n seize an= output Jink of that list by marking tbe link 
unavailable in ring 1102 and transmitting an order to the MAN switch control 22 to set up a path from an 
iLH of that MINT to the requested output link. When the full data- block to be transmitted to that output link 
has been so transmitted, the MINT marks the output link available in the data transmitted by data ring 1102 
which thereby makes that output link available for access by other MiNTs. 

A problem witn using only .availability data -la that during periods of congestion the time before a 
particular MINT may get access to an output link can be excessive, fn order to even the accessibility of any 
output link to any MINT, the following arrangement is used. Associated with each link availability indication, 
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called a ready bit transmitted in ring 1 102, Is a window bit transmitted, in ring 1104. The ready bit is 
controlled by any MINT that seizes or reteases an output link. The window bit is controlled by the access 
controller 1120 of only a. single MINT called, for the purposes of ihis description, the controlling MINT, In 
this particular embodiment, the controlling MINT for a given output link is the MINT to which the 

s corresponding output link Is routed. 

The effect of an open window (window bit = 1) is to let the first access controller on the ring that wants 
to seize an output link and recognises its availability as the ready bit passes the controller, seize such a 
linft, snd to tet any controller which tries to seise an unavailabie-fok set th$ priority indicator 1114 for that 
unavailable link. The effect of a closed window {window bit = 0) is to permit only controllers which have a 

to priority indicator set for a corresponding available link to seize that available Hnk. The window is closed by 
the access controller 1120 of the controlling MINT whenever the logic and count circuit 1100 of that 
controller detects thst the output link is not available (ready bit « 0) and is opened whenever that controller 
detects that that output link is available (ready bit ~ 1), 

The operation of an access conlrotier Seizing a link is as follows;, ft the link is unavailable (ready bit » 

rs 0) and the window bit is one, the access controller sets the priority indicator 1114 for that output link. If the 
link is unavailable and . the window bit is zero, the controller does" nothing, if the Hnk Is available, and the 
window bit is one, the controller seizes the link and marks the ready bit sera to ensure that no other 
controller seizes the same link, if the link is available and the window bit is zero, then only a controller 
whose priority indicator 1114 Is set for that link can setae that Hnk and will do so by marking the ready bit 

20 zero. The action of the access controller of the controlling MINT on the window bit is simpler: that controller 
simply copies the value of the ready bit into the window bit 

in addition to the ready and window bits, a frame bit is circulated in ring 1 106 to define the beginning of 
a frame of resource availability data, hence* to define the count for identifying the link associated with each 
clear and window bit. Data on the three rings 1102, 1104 and 1108 circulates serially and in synchronism 

35 through the logic and count circuit 1 100 of each MINT. 

The result of this type of operation is that those access controllers which' are trying to seize an output 
link and which are located between the unit that first successfully seized that output link and the access 
controller that controls the window bit have priority and will be served in turn before any other controllers 
that subsequently may make a request to seize the specific output link. As a result, an approximately fair 

30 distribution of access by ail MINTS to all output links is achieved. 

it this alternative approach to confrolling-MINf 11 access control to the MANSO 22 is used, priority Is * 
controlled from the MINT, Each MINT maintains a priority and a regular queue For queuing requests, and 
makes requests for MANSC services first from the MINT priority queue. 

13 CONCIUSION 

It is to be understood that the above description is only of one preferred embodiment of the invention. 
Numerous other arrangements may be devised by one skilled in the art without departing from the spirit 
40 and scope of" the invention. The invention is thus limited oniy as defined in the accompanying claims. 

APPENDIX A 
ACRONYMS AND ABBREVIATIONS 



5& 1SC First Stage Controller 

2$C Second Stage Controller 

ACK Acknowledge 

ABP Address Resolution Protocol 

ARQ Automatic Repeat Request 
55 BNAK Busy Negative Acknowledge 

CO Central Control 

CNAK Control Negative Acknowledge 
CHet Control Network 

56 



EP 0 33S 562 A2 



ORG Cyclic Redundancy Check or Code 
DNei Data Network 

DRAM dynamic Random' Access Memory 
DVMA Direct Virtual Memory Access 
s EUS End User System 

EUSL End User link (Connects NIM and Ul.M) 
FEP Front End Processor 
FIFO First In Brst Out 

FNAK Fabric Blocking Negative Acknowledge 
to It. Internal Link {Connects MINT and MANS) 

ILH Internal Link Handler 

IP Internet Protocol 

LAN Local Area Network 
■ LUWU Long User Work Unit 
is MAN feemplary Metropolitan Area Network 

MANS MAN Switch: 

MANSC MAN/Switch Controller 

MfNT Memory and Interface Module 

MMU Memory Management Unit 
aa NAK Negative Acknowledge 

NIM Network Interface Module 

OA&M Operation, Administration and Maintenance 

PASC .Phase Alignment and Scramble Circuit 

SCC Switch' Control Complex 
25 SUWU Snort User Work Unit 

TCP Transmission Control Protocol 

ISA Time Slot Assigner 

UDP User Datagram Protocol 

[3M User Interface Module 
oo UWU User Work Unit 

VLSI Very Large Scale integration 

VME& bus An IEEE S&ndard Bus 

WAN Wide Area Network 

XL External Link (Connects NJM to MINT) 
3S XLH Externa! Link Handler 

XPC Qrosspoint Control&r 



Claims 

40 

1, A data switching network for connecting a plurality of inlets to a plurality of outlets, comprising: 
circuit switch means for switchabiy connecting a plurality .of inputs and said, plurality of outlets; and 

a plurality of data distribution means for assembling and chaining data packets from ones of said plurality of 
Nets for transmission to one of said outlets and for transmitting said chained data packets to one ot said 
45 inputs of said circuit switch for connection to said' one outlet. 

2, The network of claim t wherein eacfc.of said data distribution means comprises: 
a memory for storing incoming data packets; 

a first, plurality of microprocessors connected to ones of said plurality af inlets for controlling the storage of 

header information of each of said data packets: and 
so .a second plurality of microprocessors for processing said header information and queuing data packets 

destined for. a common outlet 

& The networi< of ciaim 2 further comprising means operative under the control of said second plurality 

of microprocessors for controlling transmission of said queued data packets destined for said common 

outlet to one of said inputs of said circuit switch means, 
ss 4, The network of claim 1 wherein said data packets comprise voice packets. 

5. A metropolitan area data switching network for switching data packets, comprising a central hub for 

connecting a plurality of inlets to a plurality of outlets, said hub comprising: 

a circuit switch for switchabiy connecting a plurality of inputs and said plurality of outlets; 
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a plurality of. data. ■ distribution modules for assembling and chaining data streams, said data streams 
comprising data and voice packets, from ones of said plurality of inlets for transmission to one of said 
outfe.fr and transmitting sard chained data streams to one of said inputs of said circuit switch for connection 
to said one outlet; and 

s means for concentrating data from a plurality of end user systems to a high-speed data link, connected to 

one of said plurality of data distribution modules, sa*d means for concentrating com prising means for 

adding port identification data to said transmitted packet; 

wherein each of said data distribution modules comprises: 

a memory for storing incoming data packets; 
n a first plurality of microprocessors connected to said plurality of inlets for controlling storage of header 

information of each of said data packets; and 

a second plurality of microprocessors for processing said header information and chaining : the data packets 
destined for a common outlet; 

means* operative under the control of said second plurality of microprocessors, for controlling transmission 
j s of said chained data packets destined for said common outlet to one of said inputs; and 

ccntroi means for verifying thai a source, identified by a source identification, of each data packet is 

authorized to" transmit to a destination of that data packet and for verifying that said port identification is 

authorized to transmit with said source identification. 
6',' The network of claim 5 further comprising: 
so a plurality of data concentration/distribution modules each for concentrating data traffic from a plurality of 

end users to an inlet of said hub, and for distributing data traffic from an outlet of said hub to said plurality 

of end users, 

7. A data switch having a plurality of inlets and ouitets, comprising? 
a plurality of data : distribution, switch means, each for chaining groups of data packets received on ones of 
2$ said plurality of inlets connected to said each data distribution switch means and destined for one of said 
plurality of outlets; and 

circuit switch means connected, to said data distribution switch means for setting up a circuit connection 
from one of said data distribution switch means to one of said outlets for each of said groups of chained 
packets. 

30 8. in a data switching system, a method of transmitting data packets' each to one of a plurality of outlets 
comprising the steps of: 

chaining groups of data packets destined for a common outlet; and 

transmitting a request for a connection to a circuit switch for each chained group of data packets. 

9. The data switching network of claim T wherein said circuit switch, means comprises a plurality of 
as controllers each for controlling one of a plurality of disjoint sets of connections in : said circuit switching 
network. 

10* The data switching network of claim 9 wherein said circuit switch means comprises a space division, 
network for swltehafely- connecting said plurality of inputs and said plurality of outlets. 

11 The method of claim 8 wherein said circuit switch comprises a plurality of controllers each for 
40 controlling one of a disjoint set of connections of said circuit switch wherein said transmitting step 
comprises the step of: 

transmitting a request for a connection to one of said controllers of said circuit switch, said one controller 
controlling a disjoint set of connections that includes said requested connection, 

12. The method of claim 11 wherein said data switching system comprises a plurality of data switching 
modules each connected lo at feast one inlet and one output and wherein said circuit switch connects each 
of said outputs of said plurality of data switching modules to said plurality of outlets further comprising the 
steps of: 

in each of said plurality of data distribution modules, storing packets received on. said at least one inlet; 
determining an outlet for which each stored packet is to be transmitted and chaining data packets which are 
so to be transmitted to a common outlet; 

receiving an indication that a requested connection has been established transmitted from one of said 
controllers to one of said data switching modules; and 

transmitting a chained group of data packets from said one of said data switching modules to said circuit 
switch for transmission over said established requested connection. 
55 13. A data switching system for switching data packets from a plurality of inlets to a p!ura% of outlets, 
comprising: 

a plurality of data switching means, each having, at least one output for chaining data packets from ones of 
safti plurality of inlets to one of said plurality of outlets; and 
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circuit switching means connected to said plurality of data switching means for connecting outputs of said 
plurality of data switching means to said plurality of outlets; 

each of said date switching means comprising means for requesting, of said circuit switching means a 
connection between an output of said each data switching means and one of said plurality of outlets, said 
means for requesting comprising high priority and low priority queues for storing requests to set up a 
connection for transmitting a chain of data packets having high priority and low priority respectively, 

14. The data: switching system of claim 13, wherein said circuit switching means comprises at least one 
controller, said at least one controller comprising queues for requests from ones of said plurality of data 
switching modules said queues comprising a queue for high priority requests and a queue for low priority 

W requests. 

15. The, data switching system of claim 14 wherein said packed comprise data for identifying high 
priority packets and wherein, said high priority requests comprise requests to switch a chain of packets 
headed by a high priority packet. 

16. A data, switching system comprising; 

15 a data concentrationVdistributkm stage for concentrating data packets from a plurality of sources to one of a 
plurality of duplex high-speed data links and tor distributing data packets from, one of said plurality of 
duplex high-speed data links to a plurality of destinations;, and 
a hub for switching data packets among .said plurality of high-speed data links; 

wherein said hub comprises a pluraiity of data switching modules for switching data packets from ones of 
?_o said plurality of high-speed data links to outputs of each of said data switching modules and a circuit switch 
for switching from said outputs of said data switching modules to ones of said plurality of high-speed data ■ 
links; 

wherein each of said data switching modules comprises means for chaining data packets destined for a 
common high-speed data link and for transmitting connection requests to said circuit switch; 
ss wherein said circuit switch comprises #t least one controller comprising queues for requests from ones of 
said plurality of data switching modules, said queues comprising a queue for high priority requests anq* a 
queue far low priority requests; 

wherein said data packets comprise data for identifying high priority packets and wherein said high priority 
requests comprise requests to switch a chain of packets headed by a high priority packet; 
30 wherein each of said data switching modules comprises a queue for high priority circuit switch setup 
requests and a queue for low priority circuit switch setup requests and comprises means for transmitting to 
said at least one controller of said circuit switch requests from said queue for high priority requests before 
transmitting requests from said queue for low priority requests, 

17. In a data switching system, a method of transmitting data packets each to one of a plurality of 
$s outlets, comprising the steps of: 

chaining groups of data packets destined for a common outlet; 

determining ior each chained group of data packets whether said group is high priority or Jow; .priority; 
transmitting a high priority request for a connection to a circuit switch for each chained^ group of data 
packets having high priority; and 
40 transmitting a low priority request for a connection to said circuit switch for each chained group of data 
packets having low priority. 

18. The. data switching system of claim 13 wherein said packets comprise data for identifying high 
priority packets and wherein said high priority requests comprise requests to switch a chain of packets" 
including at least one high priority packet, 

45 19. the data switching system of claim 13 wherein each of said, data packets is limited rn tength to a 
predetermined number of bits. 

20. The data switching system of claim 19 wherein said high priority requests further comprise requests 
to switch a chain of packets including at feast one high priority packet 

: 21. The data switching system of claim 16 wherein ■ said data packets are limited in size to a 
so predetermined number of bits, 

21 The method of claim 17 wherein said packets comprise data for identifying high priority packets and 
wherein said determining' step comprises the step of determining for each data packet of a chained group 
of data packets whether said data packet is high priority and classifying said chained group of data packets 
as high priority if any of said data packets of said chained group is classified as high priority, 
5$ 23, The method' of claim 17 wherein said packets comprise data for identifying high priority packets and 
wherein said determining step comprises the step of determining for a first data packet of a chained group 
of data packets whether said data packet is high priority and classifying said chained group of data packets 
as high priority if a first of said data packets of said chained group is classified as high priority. 
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24. The method of claim 17 further comprising the steps of; 

following said determining step, storing a high priority request for each group determined to be high priority 
hi a high priority request queue; and 

ior each chained group determined to be low priority storing a low priority request in a low priority request 
s queued 

25. The method of claim 17 further comprising the step of: 

attempting to establish connections in said circuit switch in response to said high priority requests before 
attempting to establish connections in response to said low priority requests, 
26* A system for switching voice signals comprising: 
it) me$ns for converting said voice signals into voice packets; and means* connected to said means for 
converting, for packet switching said voice packets, comprising: 
a plurality of input pacKet handiers and a plurality of output packet handlers; 

memory access means for controlling storing and reading of said voice packets, comprising a plurality of 
memory access controllers for storing consecutive words of a voice packet in consecutive members of a 
rs plurality of memory modules; and 

means for distributing said voice packets from said plurality of input packet handlers fe said plurality of 
memory access controllers and for assembling said voice packets from Said plurality of memory access 
controllers to said plurality of output packet handlers. 

27, The system of claim £6, comprising a plurality of said means for converting and a plurality of said 
20 means for packet switching further comprising circuit switch means for switching said voice packets 

between output packet handlers of a plurality of said means for packet: switching and ones of a plurality of 
communication paths, and wherein said means for packet switching said voice packets comprise means for 
chaining voice packets in groups, each group ior connection over one of said communication paths. 

28, The system of claim 27 wherein ones of said plurality of communication paths are cwinectable to a 
25 packet to digital voice .signal converter 

2a The system of claim 28 wherein said means for converting said voice signals into voice packets is 
comprised in a digital switching system connecteble to customer stations; 

said digital switching systems further comprising means for generating signaling information to $wd means 
for converting lor signaling terminal identification data for switching packets of a voice connection to a 
* 30 customer station, and for generating signaling information to said means for converting for signaling the 
identity of a requested customer station to a switch serving that requested customer station, 

30* IK network for switching first packets comprising data and second packets comprising voice signals, 
comprising: 

first data switching: means for switching said first and said second packets to first and second outputs 
35 respectively; 

circuit switching means connected to said' first outputs for further switching said first packets; and 

second data switching means connected to said second outputs for further switching said second packets. 
31* A system, for switching data and voice signals comprising: 

digital switching means connectable to customer lines for generating digital speech signals; 
4<j means for generating speech channel identification information; 

means connected to said digital switching means for converting speech signals into voice packets and 

responsive to said speech channel identification information for generating headers to said voice packets; 

means for concentrating data traffic from and distributing traffic to said means for generating voice packets; 

means, connected via data links to said means for concentrating* for packet switching said voice packets 
■4s comprising: 

a plurality of input packet handlers and a plurality of output packet handlers; 

memory means for storing said voice packets comprising a plurality of memory modules for storing 
consecutive ..words of a voice packet; 

means for chaining packets, into groups destined for a common means for distributing and for communicat- 
ee ing said chaining data to said output packet handlers; 

means, controlled by said input packet handlers for distributing said voice packets from said plurality of 
input packet handlers to said plurality of memory modules and* controlled by said output packet handlers, 
for assembling said chained groups of voice packets from said plurality of memory modules to said plurality 
of output packet handlers. 
$$ 32* The system of claim 31 further comprising: 

circuit switching means connected to said means for packet switching for groups of packets from said 
means for packet switching to ones of data links connected to said means for concentrating data 
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33, A method: of switching voice and data packets comprising the steps of; 
packet switching said voice packet? received on inputs of a first packet switch means to First outputs of said 
first packet switch means and said data packets to second outputs of said first packet switch means; 
connecting said first outputs to a circuit switch mean? and said second outputs to a second packet switch 
5 means. 

.34, A. method of switching voice signals comprising the steps of:, 
converting said voice signals to voice packets; 

transmitting said voice packets to an Input packet handler of a data switching means; 
transmitting' data from said input packet handier to a plurality of memory access controllers of sad -data 
to switching means for controlling storage of voice packets in. a plurality of memory modules; 
chaining packets into groups having a common intermediate destination; and 

transmitting each of said groups from said plurality of memory access controllers to an output data handler 
of said data switching means for further fransmission to one of said intermediate destinations, 

' 35. A network, for switching first packets, comprising data, and second packets, comprising information 
i$ representing voice signals, from a plurality of inieis to a plurality of outlets, comprising: 
first and second data switching means: and 

circuit switching means; r 
said first data switching means for switching said first and said second packets received from said inlets to 
said circuit switching means for further switching to said outlets and to said second data switching means, 
so respectively; 

said circuit switching means responsive to said packets received from said first data switching means for 
switching said first, and second packets to said outlets and said second data, switching means respectively; 
said second data switching means responsive to said second packets received from said circuit switching 
means for switching said: second packets to said circuit switching means for further switching to said 
as outlets; 

said circuit switching means further responsive to said second packets received from said second data 

switching means for switching said packets to said outlets. 

30, The network of claim 35 wherein each of said first and second data switching means comprise 

m.eans for generating control signals for selecting outlets and second data switching means and wherein 
30 said circuit switching means is responsive to said control signals For switching a packet received from one 

of snid data switching means to an outlet or a second data switching means .selected by a control signal 

from said one of said data switching means, 

37, The network of claim 36 wherein each of said data switching means- comprise a plurality of data 

switching modules, and wherein each of said data switching modules of said first data switching means 
35 comprises means for chaining received first data packets destined for a common outlet and for chaining 

received second data packets destined for a common one of said plurality of data switching modules of 

said second data switching means, and means for generating control signals for controlling the switching by 

said circuit switching means of said chained received packets to said common outlet or said one of said 

plurality of switching modules of said second data switching means.,.. 
40 38. The network of claim 37 wherein each of said data switching modules of said second data switching 

raaans comprises means for chaining: received second data packets destined for another common outlet 

and means for generating control signals for switching said chained received packets to said other common 

outlet. 

39. In a data switching system comprising circuit switching means and first and second data switching 
45 means, a method for switching first packets comprising data and second packets, comprising information 

representing voice sfgnate from a plurality of inlets to said first data switching means to a plurality of outlets 
comprising the steps of:. 

data switching said first packets, from said iniets to said first data switching means, to said circuit switching 
means for further switching to said outlets; 
so data switching said second packets, from said inlets to said first data switching means, to said circuit 
switching means for further switching to said second data switching means; 

data switching said second packets in said second data switching means to said circuit switching means for 
further switching to said outlets. 

40. The method of ciaim 39 further comprising the steps, of generating control signals in said first data 
ss switching means for causing said circuit switching means to switch ories of said packets to outlets or said 

second data switching means. 
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41 The method of claim 40 wherein said second ^data switching msans comprises at least one module* 
further' comprising the steps of chaining first packets destined for a common outlet and chaining second 
packets cteStimxJ for $ modute of said second data switching mean$ ; 
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